27/03/2021

[Python] Access Twitter tweets with Tweepy

To programmatically access Twitter tweets, you can use Python's Tweepy library.


You must login to Twitter, then apply for developer access. The process is a bit delayed since you will need to wait for Twitter to approve your request (maybe even reply to their emails with additional info), but in general for academic/testing purposes, the approval process takes about 1 week.


Once you're set it is simply a matter of:

 import tweepy  
   
 twitter_auth = tweepy.AppAuthHandler(TWITTER_KEY, TWITTER_SECRET)  
 #twitter limits frequency of polls, we need to slow down automatically. You have about 900 requests/15 minutes  
 twitter_api = tweepy.API(twitter_auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)  
   
 #default is 2 weeks of data, the optional input variable since_id is the ID of the last tweet since you want to start polling  
 #we are preparing the request here, but not yet executing it  
 cur = None  
 if since_id is not None:  
  cur = tweepy.Cursor(twitter_api.search, count=TWEET_POLL_LIMIT, include_entities=False, result_type="recent", q=YOUR_HASHTAGS)  
 else:  
  cur = tweepy.Cursor(twitter_api.search, count=TWEET_POLL_LIMIT, include_entities=False, result_type="recent", q=YOUR_HASHTAGS, since_id=since_id)  
   
 #we retrieve the tweets here, we can optionally limit the total we retrieve. If we do not limit ourselves and go over our quota, we are stalled until the next window is available to us  
 #the config we did on the twitter_api object will handle this automatically for us (wait_on_rate_limit) and notify us as well (wait_on_rate_limit_notify)  
 tweets = None  
 if TWITTER_POST_LIMIT == 0:  
  tweets = cur.items()  
 else:  
  tweets = cur.items(TWEET_LIMIT)  
    
 for tweet in tweets:  
  #do something  

 

Some things to note:

Your query rate will likely be limited, currently the free tier has a 900 requests/15 minutes quota. If you go over that quota, your app will stall and can only resume after the quota is reset.

 

The output of the poll is a set of tweets which are simply JSON, some notable fields:

  • id: unique ID of this tweet
  • created_at: a string representation of the datetime when this tweet was created
  • text: the body of the tweet

 

Strangely, the created_at value is NOT an epoch but a human readable string, to convert it to date you can:

 parsed = datetime.strftime(datetime.strptime(date,'%a %b %d %H:%M:%S +0000 %Y'), '%Y-%m-%d %H:%M:%S')  


And to convert that to epoch you can:

import ciso8601
return int(ciso8601.parse_datetime(parsed).timestamp()) 


To verify if a tweet is a retweet you can check if it starts with RT @:

tweet._json["text"].startswith("RT @")

No comments:

Post a Comment

With great power comes great responsibility