Data Pipeline

Spring Batch vs Data Pipeline – ETL Job Example

Posted On 4 Oct 2016
By Dele Taylor
In Batch, Data Pipeline, Java, Spring Framework
Leave a comment

Updated: July 2021

Most examples of creating a Spring Batch ETL Job require an enormous amount of code for such a routine task. In this blog, I will show you how to accomplish the same task of summarizing a million stock trades to find the open, close, high, and low prices for each symbol using our Data Pipeline framework.

Continue reading →

25 Conferences Data Scientists Should Attend in 2022 and 2023

Posted On 19 Jul 2016
By The DataPipeline Team
In Data Science
Leave a comment

Updated: June 2022

Being a data scientist means dedication to continuous learning. One great way to keep learning, improve your network, and get exposed to different views is to attend conferences.

Since 2020 organizers have been opting for online virtual conferences instead of in-person conferences. In 2021 and 2022 the same trend continues although some conferences are also being scheduled to be attended in person since the second half of the year 2021.

Data science conferences are one of the best ways to learn, develop new skills, meet and discuss ideas and discover how others are applying AI, analytics and machine learning in their work.

Here are several conferences for data scientists you should consider attending.

Continue reading →

How to speed up JDBC inserts?

Posted On 17 May 2016
By Dele Taylor
In Database, Java
View all 8 comments

How to speed up JDBC inserts

Updated: May 2023

When trying to assess how knowledgeable a developer is in general and in JDBC in particular, here’s a question I like to ask: how would you speed up inserts when using JDBC?

Here are some options to consider if you ever need to insert data quickly into an SQL database.

Continue reading →

Data Pipeline v4.1 Adds MongoDB Support

We’re excited to introduce Data Pipeline version 4.1, the second update on our 2016 roadmap.

This release features MongoDB integration, expression language additions, and improved transformations and joins. We’ve also thrown in a ton of examples for all the new 4.1 and 4.0 features. Enjoy. Continue reading →

Data Pipeline 3.1.4 Now Available

Data Pipeline v3.1.4 is now available for download. This release includes support for MySQL upserts, lower JSON and XML memory usage, bug fixes, and more.
Continue reading →

How To Aggregate Twitter Searches Without A Database

Posted On 11 Aug 2015
By Dele Taylor
In Data Pipeline, Java, Twitter
One comment so far

One feature of Data Pipeline is its ability to aggregate data without a database. This feature allows you to apply SQL “group by” operations to JSON, CSV, XML, Java beans, and other formats on-the-fly — in real-time. This quick tutorial will show you how to use the GroupByReader class to aggregate Twitter search results.

Continue reading →

Data Pipeline 3.1 Now Available

Data Pipeline 3.1 is now available for download. This is a milestone release that adds native support for hierarchical data (nested records and multidimensional arrays).

Continue reading →

How to read data in parallel using AsyncMultiReader

Posted On 26 Jun 2015
By Dele Taylor
In Data Pipeline, Exceptions, Multithreading
One comment so far

How to read data in parallel using AsyncMultiReader

Data Pipeline now includes a new AsyncMultiReader endpoint that lets you read from multiple DataReaders in parallel. Here’s how it works.

Continue reading →

How to convert XML to Excel (2023)

Posted On 22 Jun 2015
By The DataPipeline Team
In Data Pipeline, Excel, Java, News, XML
View all 2 comments

Data Pipeline makes it easy to read, transform, and write XML and Excel files. This post shows you how you too can load data from an on-disk XML file, apply transformations on the fly, and save the result to an Excel file.

Continue reading →

How to create multiple sheets in a single Excel file

Posted On 30 May 2015
By Dele Taylor
In Data Pipeline, Excel, Java, News
One comment so far

Data Pipeline lets you read, write, and convert Excel files using a very simple API. This post will show you how to create Excel files containing more than one work sheet or tab.

Continue reading →

Spring Batch vs Data Pipeline – ETL Job Example

25 Conferences Data Scientists Should Attend in 2022 and 2023

How to speed up JDBC inserts?

Data Pipeline v4.1 Adds MongoDB Support

Data Pipeline 3.1.4 Now Available

How To Aggregate Twitter Searches Without A Database

Data Pipeline 3.1 Now Available

How to read data in parallel using AsyncMultiReader

How to convert XML to Excel (2023)

How to create multiple sheets in a single Excel file

Docs

Company

Tools