Sunday, August 7, 2011

get/backup a user's tweets by bash

A shell script to download all the tweets of some user to a text file.

Put the contents of this file to a file called get_tweets.sh for example.
Then do "chmod +x get_tweets.sh"
And follow the usage instructions.
#!/bin/bash
#written by rikijpn
#use me like this:
#get_tweets.sh $SOME_USER
#or replace the USERNAME variable for the name of the wanted user
#get_tweets.sh

if [ ! -z $1 ] ;then
    USERNAME=$1
else
    USERNAME=linuxquestions #any default user name you want
fi
URL="http://twitter.com/$USERNAME?page="
FILE_TO_SAVE="tweets.txt"
TEMP_PAGE=`date +%d%m%H%M%S`.txt

printf "\n" > $FILE_TO_SAVE

for PAGE_NUMBER in {1..1000}; do 
    echo "getting page number $PAGE_NUMBER..."
    wget -O $TEMP_PAGE "$URL""$PAGE_NUMBER" 2>/dev/null;
    RELEVANT_LINES_NUMBER=`grep -e 'entry-content' -e 'timestamp' $TEMP_PAGE|wc -l`
    if [ $RELEVANT_LINES_NUMBER -lt 1 ] ; then  break; fi
    grep -e 'entry-content' -e 'timestamp' $TEMP_PAGE |
    sed \
 -e 's,.*<span class="entry-content">,,g' \
 -e 's,</span>,,g' \
 -e 's,.*<span class="published timestamp" data="{time:,,g' \
 -e "s/'//g" \
 -e 's/}">/ --> /g' \
 -e 's/<a.[^>]*>\(.[^<]*\)<\/a>/\1/g' \
 -e 's/^.*<span class="shared-content">/<RETWEET>/g' \
 -e 's/<\/a>/ in local time\n/g' >>$FILE_TO_SAVE
    sleep 5
done

rm $TEMP_PAGE
echo "all pages fetched"
exit 0

It differences "retweets", and outputs the date so you can tell when was it posted.

You can read it at my personal webpage as well.

No comments:

Post a Comment