Twint: Twitter Scraping Without Twitter’s API
A hands-on guide to scraping anybody’s tweet without Twitter’s API using Twitter Intelligence Tool called TWINT
What is Twint ?
Twint is an advanced tool for Twitter scrapping. We can use this tool to scrape any user’s tweets without having to use Twitter API.Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles .
Twint utilizes Twitter’s search operators to let you scrape Tweets from specific users, scrape Tweets relating to certain topics, hashtags & trends, or sort out sensitive information from Tweets like e-mail and phone numbers. I find this very useful, and you can get creative with it too.
Above all ,Twint has these major benefits:
- Twitter API has restrictions to scrape only the last 3200 Tweets. But Twint can fetch almost all Tweets.
- Set up is really quick as there is no hassle of setting up Twitter API.
- Can be used anonymously without Twitter sign-up.
- It’s free!! No pricing limitations.
- Provides easy to use options to store scraped tweets into different formats — CSV, JSON, SQLite, and Elasticsearch
Installation
You can install Twint using the pip command or directly from git.
Using pip
Directly from Git
Code Walkthrough
If you trying the following code on the notebook you may get into the error “RuntimeError: This event loop is already running”. To resolve this error, just run the below code in the cell.(used once to enable concurrent actions within a notebook.)
1. Scrape the tweets of user
If we have a Twitter username of the profile we can directly scrape the tweets of a specific user.
# Configure
c = twint.Config()
c.Limit = 1
c.Username = "narendramodi"
# Run
twint.run.Search(c)
First of all, we want to configure Twint. After defining all the parameters we can run and search Twint.
Limit : Number of tweets to pull.
2. Scrape the tweets between specific dates
c = twint.Config()
c.Lang = "en"
c.Username = "narendramodi"
c.Hide_output = True
c.Since = '2020-10-12'
c.until= '2021-01-20'
# Run
twint.run.Search(c)
since : Filter tweets from this date.
until : Filter tweets upto this date.
3. Scrape tweets for specific search strings
We can specify different search terms to filter over tweets.
# Configure
c = twint.Config()
c.Lang = "en"
c.Hide_output = True
c.Username = "narendramodi"
c.Search = ['India','bjp']
c.Limit = 1
# Run
twint.run.Search(c)
4. Scrape tweets contain Media (images or videos or both)
# Configure
c = twint.Config()
c.Username = "narendramodi"
c.Limit = 1
#c.Images= True
#c.Vidoes = True
c.Media = True
# Run
twint.run.Search(c)
Images : Display only tweets with images .
Videos : Display only tweets with videos.
Media : Display tweets with only images or videos.
5. Scrape popular tweets of user
# Configure
c = twint.Config()
c.Username = "narendramodi"
c.Limit = 1
c.Popular_tweets = True
# Run
twint.run.Search(c)
Popular_tweets : Scrape popular tweets ,most recent(default=False)
6. Filter tweets based on min likes, min retweets, and min replies
c = twint.Config()
c.Username = "narendramodi"
c.Limit = 1
c.Min_likes = 5000
c.Min_replies = 1000
c.Min_retweets = 100
twint.run.Search(c)
Min_likes : Filter tweets by minimum number of likes.
Min_retweets : Filter tweets by minimum number of retweets.
Min_replies : Filter tweets by minimum number of replies.
7. Scrape tweets contain specific hashtags
In search terms we can search over hashtags.
c = twint.Config()
c.Search = '#blacklivesmatter'
c.Limit = 20
twint.run.Search(c)
8. Store as Pandas DataFrame
We can store entire scrapped data into dataframe.
c = twint.Config()
c.Limit = 1
c.Username = 'narendramodi'
c.Pandas = True
twint.run.Search(c)
Tweets_df = twint.storage.panda.Tweets_df
9. Display the tweet
We can display tweets using HTML and requests.
from IPython.display import HTML
import requests
def show_tweet(link):
'''Display the contents of a tweet. '''
url = 'https://publish.twitter.com/oembed?url=%s' % link
response = requests.get(url)
html = response.json()["html"]
display(HTML(html))sample_tweet_link = Tweets_df.sample(1)['link'].values[0]
display(sample_tweet_link)
show_tweet(sample_tweet_link)
Conclusion
Twint is a great package to build social media monitoring applications without being blocked by the Twitter API and its rate limits.
After scrapping the tweets we can done a effective NLP analysis because dataframe contains lots of features and also we can derive new features.
You can find my complete solution in my Github Repository ,and if you have any suggestions, please contact me via Linkedin