Multi-threaded Processing
Data Pipeline runs each job in a single thread by default. The Job
class uses its thread to pull data from readers and push it to writers
one record at a time .
Each record flowing through the pipeline goes from step to step until it's either written out or discarded. Only after a record is finished traveling through the pipeline does the next record get read in to start the journey again. This idea called one-piece flow (or single-piece flow) processing is a principle of lean manufacturing.
You've already seen how individual jobs can be made to run synchronously (blocking the main thread) or asynchronously (in their own thread). This section will show you how to increase your record flow by running parts of a job in parallel, in their own threads — regardless of how the job itself is executed.