Put the contents of this file to a file called get_tweets.sh for example.
Then do "chmod +x get_tweets.sh"
And follow the usage instructions.
#!/bin/bash #written by rikijpn #use me like this: #get_tweets.sh $SOME_USER #or replace the USERNAME variable for the name of the wanted user #get_tweets.sh if [ ! -z $1 ] ;then USERNAME=$1 else USERNAME=linuxquestions #any default user name you want fi URL="http://twitter.com/$USERNAME?page=" FILE_TO_SAVE="tweets.txt" TEMP_PAGE=`date +%d%m%H%M%S`.txt printf "\n" > $FILE_TO_SAVE for PAGE_NUMBER in {1..1000}; do echo "getting page number $PAGE_NUMBER..." wget -O $TEMP_PAGE "$URL""$PAGE_NUMBER" 2>/dev/null; RELEVANT_LINES_NUMBER=`grep -e 'entry-content' -e 'timestamp' $TEMP_PAGE|wc -l` if [ $RELEVANT_LINES_NUMBER -lt 1 ] ; then break; fi grep -e 'entry-content' -e 'timestamp' $TEMP_PAGE | sed \ -e 's,.*<span class="entry-content">,,g' \ -e 's,</span>,,g' \ -e 's,.*<span class="published timestamp" data="{time:,,g' \ -e "s/'//g" \ -e 's/}">/ --> /g' \ -e 's/<a.[^>]*>\(.[^<]*\)<\/a>/\1/g' \ -e 's/^.*<span class="shared-content">/<RETWEET>/g' \ -e 's/<\/a>/ in local time\n/g' >>$FILE_TO_SAVE sleep 5 done rm $TEMP_PAGE echo "all pages fetched" exit 0It differences "retweets", and outputs the date so you can tell when was it posted.
You can read it at my personal webpage as well.
No comments:
Post a Comment