Search Twitter for Tweets
This example searches Twitter for tweets based on specific criteria or keywords. It enables users to access the vast amount of data available on Twitter and retrieve relevant tweets that match their search queries. By leveraging Data Pipeline, users can easily tap into Twitter's real-time data and extract valuable insights from the tweets.
Businesses and organizations can use this example to monitor conversations and trends on Twitter related to their brand, products, or industry. By searching for tweets with specific keywords or hashtags, they can gain valuable insights into customer sentiments, identify emerging trends, and track the success of marketing campaigns. This information can be used to inform decision-making, enhance brand reputation, and improve customer engagement.
Java Code Listing
package com.northconcepts.datapipeline.examples.cookbook; import java.io.File; import org.apache.log4j.Logger; import com.northconcepts.datapipeline.core.DataEndpoint; import com.northconcepts.datapipeline.core.DataReader; import com.northconcepts.datapipeline.core.DataWriter; import com.northconcepts.datapipeline.core.MultiWriter; import com.northconcepts.datapipeline.core.StreamWriter; import com.northconcepts.datapipeline.excel.ExcelDocument; import com.northconcepts.datapipeline.excel.ExcelWriter; import com.northconcepts.datapipeline.job.Job; import com.northconcepts.datapipeline.transform.BasicFieldTransformer; import com.northconcepts.datapipeline.transform.CopyField; import com.northconcepts.datapipeline.transform.MoveFieldAfter; import com.northconcepts.datapipeline.transform.TransformingReader; import com.northconcepts.datapipeline.twitter.ApiLimitPolicy; import com.northconcepts.datapipeline.twitter.TwitterSearchReader; public class SearchTwitterForTweets { public static final Logger log = DataEndpoint.log; private static final String QUERY = "#Java"; private static final int MAX_RESULTS = 500; private static final String CONSUMER_KEY = "*********"; // Your Twitter API Consumer Key private static final String CONSUMER_SECRET = "***********"; // Your Twitter API Consumer Secret public static void main(String[] args) { // Read from Twitter DataReader reader = new TwitterSearchReader(CONSUMER_KEY, CONSUMER_SECRET, QUERY, MAX_RESULTS) .setApiLimitPolicy(ApiLimitPolicy.STOP); // Split CreatedAt into CreatedAtDate and CreatedAtTime reader = new TransformingReader(reader) .add(new CopyField("CreatedAt", "CreatedAtDate", false)) .add(new CopyField("CreatedAt", "CreatedAtTime", false)) .add(new BasicFieldTransformer("CreatedAtDate").dateTimeToDate()) .add(new BasicFieldTransformer("CreatedAtTime").dateTimeToTime()) .add(new MoveFieldAfter("CreatedAtDate", "CreatedAt")) .add(new MoveFieldAfter("CreatedAtTime", "CreatedAtDate")); // Write to console DataWriter writer1 = new StreamWriter(System.out); // Write to Excel ExcelDocument document = new ExcelDocument(); DataWriter writer2 = new ExcelWriter(document).setSheetName("search"); // Write to both console and Excel MultiWriter writer = new MultiWriter(writer1, writer2); // Run job, writing to Job.run(reader, writer); // Save Excel file document.save(new File("example/data/output/twitter-search.xlsx")); } }
Code Walkthrough
- First, TwitterSearchReader is created to obtain tweets from Twitter.
setApiLimitPolicy()
is used to set an API limit policy that specifies the action to take when the Twitter API call limits are reached.ApiLimitPolicy.STOP
stops the reader normally(without exception) if the limit is reached.- TransformingReader is created to apply transformations to records as they are being obtained from the API.
add(new CopyField())
creates a duplicate field of the field specified in the first parameter with the name specified in the second parameter.dateTimeToDate()
transforms the field specified in the parameter fromDateTime
toDate
object.- MoveFieldAfter moves the field specified in the first parameter after the field specified in the second parameter. For this example, the order of fields will be something like this
CreatedAt
,CreatedAtDate
,CreatedAtTime
. - StreamWriter is created to write obtained tweets to the console.
- ExcelWriter is created using ExcelDocument object to write obtained tweets to an Excel file. A method
setSheetName()
changes the default sheet name to whatever value passed in the parameter (i.e.balance
for this example). - MultiWriter is created to write obtained tweets to the console and an Excel file.
- Data are transferred from
TransformingReader
toMultiWriter
via Job.run() method. See how to compile and run data pipeline jobs - Output Excel file is saved using ExcelDocument.save() method.
TwitterSearchReader
Uses the Twitter Search API to obtain records by searching Twitter for the most recent tweets matching search criteria. It extends TwitterReader class and its constructor takes consumer key, consumer secrete, search query and maximum results to be obtained.
TransformingReader
A proxy that applies transformations to records passing through. It extends ProxyReader and can be with DataReader object. You can add Transformer
using add() method to apply transformations on fields of a record.
CopyField
Creates a duplicate field with the specified target name. It extends Transformer object and its constructor takes the source field name, the target field name, and an optional boolean value which determines whether the target field should overwrite the source field.
BasicFieldTransformer
A proxy that applies transformations to fields of a record passing through. It extends FieldTransformer class and its constructor takes one or more names of target fields that are going to be transformed. It includes several methods which apply transformation on fields for example dateTimeToDate()
method transforms a field with a string type to an integer type.
MoveFieldAfter
A Transformer subclass moves the field specified in the first parameter after the field specified in the second parameter.
ExcelDocument
The in-memory abstraction for an Excel workbook. It is not thread-safe and will throw an exception if used in multiple ExcelReaders and/or ExcelWriter concurrently. Since its data is stored in memory, reading and re-reading from it multiple times is very cheap (think millions of reads per second). A method save()
is used to write the Excel file on the disk.
ExcelWriter
Writes records to a Microsoft Excel document. ExcelWriter.setSheetName() method in this class can be used to assign sheet names for an Excel file. If you remove this method a default sheet name ie. sheet1, sheet2, and so on will be automatically assigned.
MultiWriter
Writes records to multiple DataWriter. Its constructor takes zero or more DataWriter
objects and the write operation will be performed on each of those objects.
Console Output
Record { 0:[Id]:LONG=[509084459472551936]:Long 1:[Text]:STRING=[+ @Squarespace & @Shutterstock MT @NewYork_CM: #TY again @Freshbooks & @MailChimp for #CreativeMornings! https://t.co/aW7cdAMyYW...140]:String 2:[Lang]:STRING=[en]:String 3:[CreatedAt]:DATETIME=[Mon Sep 08 17:02:48 EDT 2014]:Date 4:[FavoriteCount]:INT=[1]:Integer 5:[RetweetCount]:INT=[0]:Integer 6:[InReplyToScreenName]:STRING=[NewYork_CM]:String 7:[UserScreenName]:STRING=[mitgc_cm]:String 8:[UserDescription]:STRING=[#Socbiz strat + #Lean venture dev | @Plus_SocialGood Connector | #socent #impinv #sustdev | @StartingBloc #SocInn | NY exec prod...157]:String 9:[UserCreatedAt]:DATETIME=[Mon Nov 28 21:09:22 EST 2011]:Date 10:[UserTweets]:INT=[42485]:Integer 11:[UserFavouritesCount]:INT=[13329]:Integer 12:[UserFollowersCount]:INT=[6598]:Integer 13:[UserFollowingCount]:INT=[6912]:Integer 14:[UserLocation]:STRING=[New York, New York, USA]:String 15:[UserLang]:STRING=[en]:String 16:[UserTimeZone]:STRING=[London]:String 17:[UserUtcOffset]:INT=[3600]:Integer 18:[UserURL]:STRING=[http://t.co/D8ykYnFD3I]:String 19:[GeoLocationLatitude]:DOUBLE=[40.7450718]:Double 20:[GeoLocationLongitude]:DOUBLE=[-73.9964114]:Double 21:[PlaceName]:STRING=[Manhattan]:String 22:[PlaceFullName]:STRING=[Manhattan, NY]:String 23:[PlaceType]:STRING=[city]:String 24:[PlaceStreetAddress]:UNDEFINED=[null] 25:[PlaceCountryCode]:STRING=[US]:String 26:[PlaceCountry]:STRING=[United States]:String 27:[PlaceBoundingBoxType]:STRING=[Polygon]:String 28:[PlaceBoundingBoxCoordinates]:STRING=[[[40.683935,-74.026675],[40.683935,-73.910408],[40.877483,-73.910408],[40.877483,-74.026675]]]:String 29:[UserMentionEntities]:STRING=[@Squarespace @Shutterstock @NewYork_CM @freshbooks @MailChimp @mitgc_cm]:String 30:[HashtagEntities]:STRING=[#TY #CreativeMornings]:String 31:[URLEntities]:STRING=[https://pbs.twimg.com/media/BwxyTHtIQAAKDUz.jpg]:String 32:[MediaEntities]:UNDEFINED=[null] }
This is an example output and it is obtained using different search queries.