Updated: July 2021
If you have ever tried to export emails to Excel for analysis, you know it is not exactly straightforward. Maybe you need to find the top companies contacting you and your sales team. Maybe you need to perform text or sentiment analysis on the contents of your messages. Or maybe you’re creating visualizations to better understand who’s emailing you. This east guide will show you how you can use Data Pipeline to search and read emails from Gmail or G Suite, process them any way you like, and store them in Excel.
1. Create an Application Specific Password in Gmail
Before we start, create a separate password for this new app to access your emails. This way you won’t have to use your main password, which can be difficult if you have 2-factor authentication turned on. It also lets you remove access by deleting the password when you’re done exporting the data without affecting anything else.
- Go to My Account.
- Select Signing in to Google.
- Select App passwords.
- Choose Other (Custom name).
- Give it a name like Export emails to CSV.
- Retrieve your new password.
2. Read the latest 10 emails from your inbox
This first step will read the most recent emails from your inbox and write them to System.out
. One benefit of starting here is it gives you an idea of how much data you can get for each message.
All of the examples that follow will rely on the following constants.
1 2 3 |
private static final String HOST = "imap.gmail.com"; private static final String USER = "user@example.com"; private static final String PASSWD = "password"; |
Set USER
to your email address and PASSWD
to the new password you generated above.
1 2 3 4 5 6 7 8 9 10 11 |
@Test public void readLatest10Emails() throws Throwable { DataReader reader = new EmailReader(MailStore.IMAP_OVER_SSL, HOST, USER, PASSWD) .setDirection(EmailReadDirection.BACKWARD); reader = new LimitReader(reader, 10); DataWriter writer = StreamWriter.newSystemOutWriter(); Job.run(reader, writer); } |
You should always use a LimitReader
with some sane value since email boxes can be huge and you don’t want to spend all day on your first test. However, you can always remove the LimitReader line if you’d like to get all emails.
3. Read the latest 10 sent emails
By default, EmailReader
pulls emails from your Inbox folder. You can change this by setting the folder name to anything you like. In this case GmailFolders.SENT_MAIL
to read your latest sent messages only.
1 2 3 4 5 6 7 8 9 10 |
@Test public void readLatest10SentEmails() throws Throwable { DataReader reader = new EmailReader(MailStore.IMAP_OVER_SSL, HOST, USER, PASSWD) .setDirection(EmailReadDirection.BACKWARD) .setFolderName(GmailFolders.SENT_MAIL); reader = new LimitReader(reader, 10); DataWriter writer = StreamWriter.newSystemOutWriter(); Job.run(reader, writer); } |
4. Search for emails by subject
Reading from your Inbox and Sent folders are useful, but what if you need to find all emails with “Trial License” in their subject?
First, you’ll need to set the search term to a SubjectTerm
, with your query, and then you’ll need to set the folder name to All Mail to pull matching messages from all folders.
1 2 3 4 5 6 7 8 9 |
@Test public void searchEmailsBySubject() throws Throwable { DataReader reader = new EmailReader(MailStore.IMAP_OVER_SSL, HOST, USER, PASSWD) .setSearchTerm(new SubjectTerm("Trial License")) .setFolderName(GmailFolders.ALL_MAIL); reader = new LimitReader(reader, 10); DataWriter writer = StreamWriter.newSystemOutWriter(); Job.run(reader, writer); } |
Since EmailReader
uses the javax.mail classes behind the scenes, it will try to run the search efficiently on your email server if it supports it. This way you’re not waiting for all messages to be sent to the app for filtering.
5. Search for emails by date and subject
Once you start searching, you’ll likely want to add dates into the mix. You can use the AndTerm
to combine searches together. In this case, you’re looking for messages with “Trial License” in the subject sent on or after midnight July 1st, 2017.
GE in ReceivedDateTerm.GE
means greater than or equal (>=
). You’ll find all the other operators you need for all your comparisons there.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
@Test public void searchEmailsByDateAndSubject() throws Throwable { DataReader reader = new EmailReader(MailStore.IMAP_OVER_SSL, HOST, USER, PASSWD) .setSearchTerm(new AndTerm( new ReceivedDateTerm(ReceivedDateTerm.GE, DATE_TIME_FORMAT.parse("2017-07-01 00:00:00")), new SubjectTerm("Trial License"))) .setFolderName(GmailFolders.ALL_MAIL) .setDirection(EmailReadDirection.BACKWARD); reader = new LimitReader(reader, 25); DataWriter writer = StreamWriter.newSystemOutWriter(); Job.run(reader, writer); } |
We’ll also add one more constant to make parsing the date easier.
1 |
public static final SimpleDateFormat DATE_TIME_FORMAT = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); |
6. Export emails to Excel
This last example is the main code you’ll need to export your Gmail emails to Excel.
Like before, it searches for Trial Licenses from July onward and limits the results to 100 emails. Next, it selects just the fields you want to save – from, to, sentDate, subject, and content – out of all the eligible fields.
The content
field is flattened since emails can have multiple contents (plain text and HTML). The content field is then limited to the first 32767 characters – the maximum size of a cell in Excel.
Finally, the Excel document is written by the pipeline and saved to disk.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
@Test public void exportEmailsToExcel() throws Throwable { DataReader reader = new EmailReader(MailStore.IMAP_OVER_SSL, HOST, USER, PASSWD) .setSearchTerm(new AndTerm( new ReceivedDateTerm(ReceivedDateTerm.GE, DATE_TIME_FORMAT.parse("2017-07-01 00:00:00")), new SubjectTerm("Trial License"))) .setFolderName(GmailFolders.ALL_MAIL) .setDirection(EmailReadDirection.BACKWARD); reader = new LimitReader(reader, 100); reader = new TransformingReader(reader) .add(new SelectFields("from", "to", "sentDate", "subject", "content")) .add(new BasicFieldTransformer("content").flattenToString("").left(32767)); ExcelDocument document = new ExcelDocument(); DataWriter writer = new ExcelWriter(document).setSheetName("License Trials"); Job.run(reader, writer); document.save(new File("trials.xlsx")); } |
Happy coding!