Twint: Twitter Scraping Without Twitter’s API

A hands-on guide to scraping anybody’s tweet without Twitter’s API using Twitter Intelligence Tool called TWINT

What is Twint ?

Twint is an advanced tool for Twitter scrapping. We can use this tool to scrape any user’s tweets without having to use Twitter API.Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles .

Twint utilizes Twitter’s search operators to let you scrape Tweets from specific users, scrape Tweets relating to certain topics, hashtags & trends, or sort out sensitive information from Tweets like e-mail and phone numbers. I find this very useful, and you can get creative with it too.

Above all ,Twint has these major benefits:

  • Twitter API has restrictions to scrape only the last 3200 Tweets. But Twint can fetch almost all Tweets.
  • Set up is really quick as there is no hassle of setting up Twitter API.
  • Can be used anonymously without Twitter sign-up.
  • It’s free!! No pricing limitations.
  • Provides easy to use options to store scraped tweets into different formats — CSV, JSON, SQLite, and Elasticsearch

Installation

You can install Twint using the pip command or directly from git.

Using pip

Directly from Git

Code Walkthrough

If you trying the following code on the notebook you may get into the error “RuntimeError: This event loop is already running”. To resolve this error, just run the below code in the cell.(used once to enable concurrent actions within a notebook.)

1. Scrape the tweets of user

If we have a Twitter username of the profile we can directly scrape the tweets of a specific user.

# Configure
c = twint.Config()
c.Limit = 1
c.Username = "narendramodi"

# Run
twint.run.Search(c)

First of all, we want to configure Twint. After defining all the parameters we can run and search Twint.

Limit : Number of tweets to pull.

2. Scrape the tweets between specific dates

c = twint.Config()
c.Lang = "en"
c.Username = "narendramodi"
c.Hide_output = True
c.Since = '2020-10-12'
c.until= '2021-01-20'
# Run
twint.run.Search(c)

since : Filter tweets from this date.

until : Filter tweets upto this date.

3. Scrape tweets for specific search strings

We can specify different search terms to filter over tweets.

# Configure
c = twint.Config()
c.Lang = "en"
c.Hide_output = True
c.Username = "narendramodi"
c.Search = ['India','bjp']
c.Limit = 1
# Run
twint.run.Search(c)

4. Scrape tweets contain Media (images or videos or both)

# Configure
c = twint.Config()
c.Username = "narendramodi"
c.Limit = 1
#c.Images= True
#c.Vidoes = True
c.Media = True

# Run
twint.run.Search(c)

Images : Display only tweets with images .

Videos : Display only tweets with videos.

Media : Display tweets with only images or videos.

5. Scrape popular tweets of user

# Configure
c = twint.Config()
c.Username = "narendramodi"
c.Limit = 1
c.Popular_tweets = True

# Run
twint.run.Search(c)

Popular_tweets : Scrape popular tweets ,most recent(default=False)

6. Filter tweets based on min likes, min retweets, and min replies

c = twint.Config()
c.Username = "narendramodi"
c.Limit = 1
c.Min_likes = 5000
c.Min_replies = 1000
c.Min_retweets = 100

twint.run.Search(c)

Min_likes : Filter tweets by minimum number of likes.

Min_retweets : Filter tweets by minimum number of retweets.

Min_replies : Filter tweets by minimum number of replies.

7. Scrape tweets contain specific hashtags

In search terms we can search over hashtags.

c = twint.Config()
c.Search = '#blacklivesmatter'
c.Limit = 20
twint.run.Search(c)

8. Store as Pandas DataFrame

We can store entire scrapped data into dataframe.

c = twint.Config()
c.Limit = 1
c.Username = 'narendramodi'
c.Pandas = True

twint.run.Search(c)

Tweets_df = twint.storage.panda.Tweets_df

9. Display the tweet

We can display tweets using HTML and requests.

from IPython.display import HTML
import requests

def show_tweet(link):
'''Display the contents of a tweet. '''
url = 'https://publish.twitter.com/oembed?url=%s' % link
response = requests.get(url)
html = response.json()["html"]
display(HTML(html))
sample_tweet_link = Tweets_df.sample(1)['link'].values[0]
display(sample_tweet_link)
show_tweet(sample_tweet_link)

Conclusion

Twint is a great package to build social media monitoring applications without being blocked by the Twitter API and its rate limits.

After scrapping the tweets we can done a effective NLP analysis because dataframe contains lots of features and also we can derive new features.

You can find my complete solution in my Github Repository ,and if you have any suggestions, please contact me via Linkedin

Interested in Machine Learning ,Deep Learning ,OpenCV..