Search Twitter for Tweets

This example searches Twitter for tweets based on specific criteria or keywords. It enables users to access the vast amount of data available on Twitter and retrieve relevant tweets that match their search queries. By leveraging Data Pipeline, users can easily tap into Twitter's real-time data and extract valuable insights from the tweets.

Businesses and organizations can use this example to monitor conversations and trends on Twitter related to their brand, products, or industry. By searching for tweets with specific keywords or hashtags, they can gain valuable insights into customer sentiments, identify emerging trends, and track the success of marketing campaigns. This information can be used to inform decision-making, enhance brand reputation, and improve customer engagement.

 

Java Code Listing

package com.northconcepts.datapipeline.examples.cookbook;

import java.io.File;

import org.apache.log4j.Logger;

import com.northconcepts.datapipeline.core.DataEndpoint;
import com.northconcepts.datapipeline.core.DataReader;
import com.northconcepts.datapipeline.core.DataWriter;
import com.northconcepts.datapipeline.core.MultiWriter;
import com.northconcepts.datapipeline.core.StreamWriter;
import com.northconcepts.datapipeline.excel.ExcelDocument;
import com.northconcepts.datapipeline.excel.ExcelWriter;
import com.northconcepts.datapipeline.job.Job;
import com.northconcepts.datapipeline.transform.BasicFieldTransformer;
import com.northconcepts.datapipeline.transform.CopyField;
import com.northconcepts.datapipeline.transform.MoveFieldAfter;
import com.northconcepts.datapipeline.transform.TransformingReader;
import com.northconcepts.datapipeline.twitter.ApiLimitPolicy;
import com.northconcepts.datapipeline.twitter.TwitterSearchReader;

public class SearchTwitterForTweets {
    
    public static final Logger log = DataEndpoint.log;
    
    private static final String QUERY = "#Java";
    private static final int MAX_RESULTS = 500;
    private static final String CONSUMER_KEY = "*********";  // Your Twitter API Consumer Key
    private static final String CONSUMER_SECRET = "***********";  // Your Twitter API Consumer Secret

    public static void main(String[] args) {
        

        
        
        // Read from Twitter
        DataReader reader = new TwitterSearchReader(CONSUMER_KEY, CONSUMER_SECRET, QUERY, MAX_RESULTS)
            .setApiLimitPolicy(ApiLimitPolicy.STOP);
        
        // Split CreatedAt into CreatedAtDate and CreatedAtTime
        reader = new TransformingReader(reader)
            .add(new CopyField("CreatedAt", "CreatedAtDate", false))
            .add(new CopyField("CreatedAt", "CreatedAtTime", false))
            .add(new BasicFieldTransformer("CreatedAtDate").dateTimeToDate())
            .add(new BasicFieldTransformer("CreatedAtTime").dateTimeToTime())
            .add(new MoveFieldAfter("CreatedAtDate", "CreatedAt"))
            .add(new MoveFieldAfter("CreatedAtTime", "CreatedAtDate"));
      
      // Write to console  
      DataWriter writer1 = new StreamWriter(System.out);
      
      // Write to Excel  
      ExcelDocument document = new ExcelDocument();
      DataWriter writer2 = new ExcelWriter(document).setSheetName("search");

      // Write to both console and Excel  
      MultiWriter writer = new MultiWriter(writer1, writer2);
      
      // Run job, writing to
      Job.run(reader, writer); 

      
      // Save Excel file  
      document.save(new File("example/data/output/twitter-search.xlsx"));
    }

}

 

Code Walkthrough

  1. First, TwitterSearchReader is created to obtain tweets from Twitter.
  2. setApiLimitPolicy() is used to set an API limit policy that specifies the action to take when the Twitter API call limits are reached. ApiLimitPolicy.STOP stops the reader normally(without exception) if the limit is reached.
  3. TransformingReader is created to apply transformations to records as they are being obtained from the API.
  4. add(new CopyField()) creates a duplicate field of the field specified in the first parameter with the name specified in the second parameter.
  5. dateTimeToDate() transforms the field specified in the parameter from DateTime to Date object.
  6. MoveFieldAfter moves the field specified in the first parameter after the field specified in the second parameter. For this example, the order of fields will be something like this CreatedAt, CreatedAtDate, CreatedAtTime.
  7. StreamWriter is created to write obtained tweets to the console.
  8. ExcelWriter is created using ExcelDocument object to write obtained tweets to an Excel file. A method setSheetName() changes the default sheet name to whatever value passed in the parameter (i.e. balance for this example).
  9. MultiWriter is created to write obtained tweets to the console and an Excel file.
  10. Data are transferred from TransformingReader to MultiWriter via Job.run() method. See how to compile and run data pipeline jobs
  11. Output Excel file is saved using ExcelDocument.save() method.

 

TwitterSearchReader

Uses the Twitter Search API to obtain records by searching Twitter for the most recent tweets matching search criteria. It extends TwitterReader class and its constructor takes consumer key, consumer secrete, search query and maximum results to be obtained.

 

TransformingReader

A proxy that applies transformations to records passing through. It extends ProxyReader and can be with DataReader object. You can add Transformer using add() method to apply transformations on fields of a record.

 

CopyField

Creates a duplicate field with the specified target name. It extends Transformer object and its constructor takes the source field name, the target field name, and an optional boolean value which determines whether the target field should overwrite the source field.

 

BasicFieldTransformer

A proxy that applies transformations to fields of a record passing through. It extends FieldTransformer class and its constructor takes one or more names of target fields that are going to be transformed. It includes several methods which apply transformation on fields for example dateTimeToDate() method transforms a field with a string type to an integer type.

 

MoveFieldAfter

A Transformer subclass moves the field specified in the first parameter after the field specified in the second parameter.

 

ExcelDocument

The in-memory abstraction for an Excel workbook. It is not thread-safe and will throw an exception if used in multiple ExcelReaders and/or ExcelWriter concurrently. Since its data is stored in memory, reading and re-reading from it multiple times is very cheap (think millions of reads per second). A method save() is used to write the Excel file on the disk.

 

ExcelWriter

Writes records to a Microsoft Excel document. ExcelWriter.setSheetName() method in this class can be used to assign sheet names for an Excel file. If you remove this method a default sheet name ie. sheet1, sheet2, and so on will be automatically assigned.

 

MultiWriter

Writes records to multiple DataWriter. Its constructor takes zero or more DataWriter objects and the write operation will be performed on each of those objects.

 

Console Output

Record {
    0:[Id]:LONG=[509084459472551936]:Long
    1:[Text]:STRING=[+ @Squarespace & @Shutterstock MT @NewYork_CM: #TY again @Freshbooks & @MailChimp for #CreativeMornings! https://t.co/aW7cdAMyYW...140]:String
    2:[Lang]:STRING=[en]:String
    3:[CreatedAt]:DATETIME=[Mon Sep 08 17:02:48 EDT 2014]:Date
    4:[FavoriteCount]:INT=[1]:Integer
    5:[RetweetCount]:INT=[0]:Integer
    6:[InReplyToScreenName]:STRING=[NewYork_CM]:String
    7:[UserScreenName]:STRING=[mitgc_cm]:String
    8:[UserDescription]:STRING=[#Socbiz strat + #Lean venture dev | @Plus_SocialGood Connector | #socent #impinv #sustdev | @StartingBloc #SocInn | NY exec prod...157]:String
    9:[UserCreatedAt]:DATETIME=[Mon Nov 28 21:09:22 EST 2011]:Date
    10:[UserTweets]:INT=[42485]:Integer
    11:[UserFavouritesCount]:INT=[13329]:Integer
    12:[UserFollowersCount]:INT=[6598]:Integer
    13:[UserFollowingCount]:INT=[6912]:Integer
    14:[UserLocation]:STRING=[New York, New York, USA]:String
    15:[UserLang]:STRING=[en]:String
    16:[UserTimeZone]:STRING=[London]:String
    17:[UserUtcOffset]:INT=[3600]:Integer
    18:[UserURL]:STRING=[http://t.co/D8ykYnFD3I]:String
    19:[GeoLocationLatitude]:DOUBLE=[40.7450718]:Double
    20:[GeoLocationLongitude]:DOUBLE=[-73.9964114]:Double
    21:[PlaceName]:STRING=[Manhattan]:String
    22:[PlaceFullName]:STRING=[Manhattan, NY]:String
    23:[PlaceType]:STRING=[city]:String
    24:[PlaceStreetAddress]:UNDEFINED=[null]
    25:[PlaceCountryCode]:STRING=[US]:String
    26:[PlaceCountry]:STRING=[United States]:String
    27:[PlaceBoundingBoxType]:STRING=[Polygon]:String
    28:[PlaceBoundingBoxCoordinates]:STRING=[[[40.683935,-74.026675],[40.683935,-73.910408],[40.877483,-73.910408],[40.877483,-74.026675]]]:String
    29:[UserMentionEntities]:STRING=[@Squarespace @Shutterstock @NewYork_CM @freshbooks @MailChimp @mitgc_cm]:String
    30:[HashtagEntities]:STRING=[#TY #CreativeMornings]:String
    31:[URLEntities]:STRING=[https://pbs.twimg.com/media/BwxyTHtIQAAKDUz.jpg]:String
    32:[MediaEntities]:UNDEFINED=[null]
}

This is an example output and it is obtained using different search queries.

Mobile Analytics