Put the contents of this file to a file called get_tweets.sh for example.
Then do "chmod +x get_tweets.sh"
And follow the usage instructions.
#!/bin/bash
#written by rikijpn
#use me like this:
#get_tweets.sh $SOME_USER
#or replace the USERNAME variable for the name of the wanted user
#get_tweets.sh
if [ ! -z $1 ] ;then
USERNAME=$1
else
USERNAME=linuxquestions #any default user name you want
fi
URL="http://twitter.com/$USERNAME?page="
FILE_TO_SAVE="tweets.txt"
TEMP_PAGE=`date +%d%m%H%M%S`.txt
printf "\n" > $FILE_TO_SAVE
for PAGE_NUMBER in {1..1000}; do
echo "getting page number $PAGE_NUMBER..."
wget -O $TEMP_PAGE "$URL""$PAGE_NUMBER" 2>/dev/null;
RELEVANT_LINES_NUMBER=`grep -e 'entry-content' -e 'timestamp' $TEMP_PAGE|wc -l`
if [ $RELEVANT_LINES_NUMBER -lt 1 ] ; then break; fi
grep -e 'entry-content' -e 'timestamp' $TEMP_PAGE |
sed \
-e 's,.*<span class="entry-content">,,g' \
-e 's,</span>,,g' \
-e 's,.*<span class="published timestamp" data="{time:,,g' \
-e "s/'//g" \
-e 's/}">/ --> /g' \
-e 's/<a.[^>]*>\(.[^<]*\)<\/a>/\1/g' \
-e 's/^.*<span class="shared-content">/<RETWEET>/g' \
-e 's/<\/a>/ in local time\n/g' >>$FILE_TO_SAVE
sleep 5
done
rm $TEMP_PAGE
echo "all pages fetched"
exit 0
It differences "retweets", and outputs the date so you can tell when was it posted.You can read it at my personal webpage as well.
No comments:
Post a Comment