Using twitter-pandas to find friends who don't follow you back

Over the past couple of months we've been gradually working on twitter-pandas, a pandas dataframe based interface to twitter data (powered by tweepy behind the scenes). I've posted about the first limited release previously here.

The initial release was focused on just replicating the tweepy API as best as we could as a first building block to a more concise and usable library.  Similarly to the development of git-pandas, we do that by being users ourselves.  So one by one, we pick off useful types of analysis that may be done with twitter data, and do them with twitter-pandas, adding functionality, clarity and stability to the code used along the way.

In twitter-pandas, the first such analysis is simple: "which of the people I follow don't follow me back?".

To answer this question in twitter-pandas, we only need to hit one method:

from twitterpandas import TwitterPandas
from keys import TWITTER_OAUTH_SECRET, TWITTER_OAUTH_TOKEN, TWITTER_CONSUMER_SECRET, TWITTER_CONSUMER_KEY

if __name__ == '__main__':
    # create a twitter pandas client object
    tp = TwitterPandas(
        TWITTER_OAUTH_TOKEN,
        TWITTER_OAUTH_SECRET,
        TWITTER_CONSUMER_KEY,
        TWITTER_CONSUMER_SECRET
    )

    # get our own user id
    user_id = tp.api_id

    # use it to find all of our own friends (people we follow)
    df = tp.friends_friendships(id_=user_id, rich=True)
    total_friends = df.shape[0]

    # filter the df down to only those who don't follow us back
    df = df[df['target_follows_source'] == False]

    # print out the info:
    print('A total of %d of those who I follow on twitter, don\'t follow me back.' % (df.shape[0], ))
    print('...that\'s about %4.2f%% of them.\n' % ((float(df.shape[0]) / total_friends) * 100, ))
    print(df['target_user_screen_name'].values.tolist())

Which will yield:

A total of 109 of those who I follow on twitter, don't follow me back.
...that's about 59.89% of them.
['user1', ... , 'user2']

Twitter's API limits the number of requests you can issue in a 15 minute window, connections can timeout, requests can fail, and a ton of other problems can arise in the minutes or hours that this has to run (depending on how many people you follow), but unlike a lower-level library like tweepy (which we use under the hood), with twitter-pandas, it's handled for you.

The goal is to allow data scientists, researchers, analysts and others to get their data, in the format they want it, simply.

This example uses the current master branch which is not yet released to pypi, but will be in version 0.0.2 of twitter-pandas.  To install from master and try this out before the release, just use:

pip install git+https://github.com/wdm0006/twitter-pandas.git

And if you're interested in contributing, theres a few of us working on this, and there's a ton left to do.  Find us at:

https://github.com/wdm0006/twitter-pandas

Will

Will has a background in Mechanical Engineering from Auburn, but mostly just writes software now. He was the first employee at Predikto, and is currently building out the premiere platform for predictive maintenance in heavy industry there as Chief Scientist. When not working on that, he is generally working on something related to python, data science or cycling.

One Comment

Leave a Reply