Thursday, August 28, 2014

Getting Twitter data using Python.

A lot of times developers need sample Twitter test data for their apps. Twitter data is used for trending, for analysis of sentiment, etc.

Instead of using some old file with few tweets or registering for some service to give you the data, why not to get the data yourself? It is very easy with Python script that uses Tweepy, a Python library that supports Twitter API.

First of all install Python. I am on MacOS so python was already installed for me. Please refer to this guide that goes in more details about the set up.

Next you need IDE to write your Python script(s). Personally, I use TextWrangler like it was suggested by Dr. Chuck.

Install Tweepy using HomeBrew
  $ brew search pip
  $ sudo easy_install pip
  $ sudo pip install tweepy

Register with Dev Twitter to get your tokens, etc.
Go to https://dev.twitter.com/, sign-in to twitter ( create an account if you don't already have one)
    Click the profile Icon ( top left) -> My Applications -> Create New App
    Provide the necessary data and it will create an application.
    Go to the application -> click on API Keys tab
    This will show you the necessary keys to authenticate your application using OAuth.

Now you are ready to write your script that would query for a phase "Big Data" and would store first 100 results for you in csv file along with the date of the tweet.

#!/usr/bin/python
import tweepy
import csv #Import csv
auth = tweepy.auth.OAuthHandler('XXX', 'XXX')
auth.set_access_token('XX-XXX', 'XXX')

api = tweepy.API(auth)

query = 'Big Data'
max_tweets = 100
# Query for 100 twits that have Big Data in them and store it in a list
searched_tweets = [status for status in tweepy.Cursor(api.search, q=query).items(max_tweets)]
# Print entire object
print searched_tweets

# Open/Create a file to append data
csvFile = open('result.csv', 'a')
#Use csv Writer
csvWriter = csv.writer(csvFile)
counter = 0

for tweet in tweepy.Cursor(api.search,
                    q=query,
                    lang="en").items(max_tweets):
    #Write a row to the csv file. Use utf-8 since twits might have special characters
    csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])
    print tweet.created_at, tweet.text
csvFile.close()

You can examine entire tweeter object that being returned and pull more data if you like by iterating through searched_tweets and pulling each element.

Please refer to this blog post if you want to see Java version of the same concept. 

References:
http://www.pythonlearn.com/install.php
http://sachithdhanushka.blogspot.com/2014/02/mining-twitter-data-using-python.html
http://stackoverflow.com/questions/22469713/managing-tweepy-api-search

2 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Existing without the answers to the difficulties you’ve sorted out through this guide is a critical case, as well as the kind which could have badly affected my entire career if I had not discovered your website.
    Surya Informatics

    ReplyDelete