Welcome to the DataPipeline 7.0 release. Since our last update, the DataPipeline team has been hard at work adding more declarative components, new integrations, new transformations, and generally making the framework easier to use. Our goal is to make simple use-cases easy and complex ones less difficult to implement.
What’s New in DataPipeline 6.0?
We’re pleased to announce the release of DataPipeline version 6.0. This release includes our new DataPipeline Foundations addon that brings decisioning, source-target data mapping, and other cool features to your software.
11 Java Data Integration Libraries (2023)
Updated: May 2023
With data being produced from many sources in a variety of formats businesses must have a sane way to gain useful insight. Data integration is the process of transforming data from one or more sources into a form that can be loaded into a target system or used for analysis and business intelligence.
Data integration libraries take some programming burden from the shoulders of developers by abstracting data processing and transformation tasks and allowing the developer to focus on tasks that are directly related to the application logic.
Data Pipeline 4.4 Now Available
Today we’re pleased announce the release of Data Pipeline version 4.4. This update includes integration with Amazon S3, new features to better handle real-time data and aggregation, and new XML and JSON readers to speed up your development.
25 Machine Learning and Artificial Intelligence Conferences
Machine learning and artificial intelligence in general are two of today’s hottest skills. AI and ML conferences provide a place for you to improve your skills, discuss trends, and exchange ideas with other data scientists, developers, and entrepreneurs. Whether you’re new to the world of machine learning, trying to stay up-to-date, or just looking to network, there’s a conference happening for you. This article lists over 50 conferences taking place around the world for you to consider attending.
Online data prep and code generator for Data Pipeline
We’re building on a new tool to help you work faster with Data Pipeline.
This new tool is a web app that lets you interactively transform, filter, and prepare data on-the-fly. It also lets you generate Data Pipeline code based on the actions you perform.
How to Convert Tabular Data to Trees Using Aggregation
We recently received an email from a Java developer asking how to convert records in a table (like you get in a relational database, CSV, or Excel file) to a composite tree structure. Normally, we’d point to one of Data Pipeline’s XML or JSON data writers, but for good reasons those options didn’t apply here. The developer emailing us needed the hierarchical structures in object form for use in his API calls.
Since we didn’t have a general purpose, table-tree mapper, we built one. We looked at several options, but ultimately decided to add a new operator to the GroupByReader. This not only answered the immediate mapping question, but also allowed him to use the new operator with sliding window aggregation if the need ever arose.
The rest of this blog will walk you through the implementation in case you ever need to add your own custom aggregate operator to Data Pipeline.
18 ETL Tools for Java Developers (Updated 2023)
Updated: May 2023
ETL is a process for performing data extraction, transformation and loading. The process extracts data from a variety of sources and formats, transforms it into a standard structure, and loads it into a database, file, web service, or other system for analysis, visualization, machine learning, etc.
ETL tools come in a wide variety of shapes. Some run on your desktop or on-premises servers, while others run as SaaS in the cloud. Some are code-based, built on standard programming languages that many developers already know. Others are built on a custom DSL (domain specific language) in an attempt to be more intentional and require less code. Others still are completely graphical, only offering programming interfaces for complex transformations.
What follows is a list of ETL tools for developers already familiar with Java and the JVM (Java Virtual Machine) to clean, validate, filter, and prepare your data for use.
Scala and Data Pipeline – Phone Bill Calculation Example
Earlier this year a friend sent me a video showing how he implemented a phone bill calculation challenge using Scala. I took a stab at it using Java + Data Pipeline and below is what I came up with.
How about you? How would you code this using your favourite language or framework?
How to Export Emails from Gmail to Excel with Data Pipeline
Updated: July 2021
If you have ever tried to export emails to Excel for analysis, you know it is not exactly straightforward. Maybe you need to find the top companies contacting you and your sales team. Maybe you need to perform text or sentiment analysis on the contents of your messages. Or maybe you’re creating visualizations to better understand who’s emailing you. This east guide will show you how you can use Data Pipeline to search and read emails from Gmail or G Suite, process them any way you like, and store them in Excel.