Category Archives: Spring Batch

Spring Book – Chapter 21 – Spring Batch

Spring Batch is the first Java based lightweight, comprehensive framework for batch processing. Being built on top of the Spring Framework, it gives all the advantages of productive, POJO-based development approach.

  • Spring Batch is a lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems. Spring Batch builds upon the productivity, POJO-based development approach, and general ease of use capabilities people have come to know from the Spring Framework, while making it easy for developers to access and leverage more advanced enterprise services when necessary.
  • – Spring batch documentation (http://static.springsource.org/spring-batch/)

In this Chapter, we will cover Spring Batch in a comprehensive manner and aims at letting you use it in your enterprise application with less effort. The initial sections will introduce you to the core concepts on which Spring Batch is build and then later on delve deep into the various concepts in details along with appropriate code snippets.

Batch and Offline Processing

You would have heard these terms batch and offline processing for quite some time now and every time you have such requirement in your application you sit in front of the computer and start something fresh and code from scratch to realize this. With Spring Batch you now have a framework using which you can do these in a much better and cleaner manner. Before going through Spring Batch in detail, I would like to spend sometime explaining the usual business case which Spring Batch tries to achieve. Primary among these are batch and offline processing.

Batch Processing

Batch applications needs to process high volume of business critical transactional data in an efficient manner. A typical batch program does the following:

  • Reads a large amount of data from database, file or queue as the case may be.
  • Processes the data according to the business requirement in an efficient manner.
  • Writes back the modified/processed data to database, file or queue as the case may be.

Batch is a group of similar or identical items and the pseudo code for this can be shown in Figure 23-1 below.

 Figure 23-1. Pseudocode for a typical Batch

Figure 23-1. Pseudocode for a typical Batch

Offline Processing

In most modern day applications there requires capability processing client request in offline manner. Offline processing differs from online/real-time processing with respect to the following aspects as outlined below:

  • Processing of processes which are long-running and which occurs beyond usual office hours
  • Non-interactive in nature and often requires appropriate logic capable of handling errors and taking necessary actions like restart in some cases and so forth.
  • Processing of processes which have large amount of data not fitting into a single transaction.

Some of the common examples of batch and offline processing are as given below, so that you can understand the use of Spring Batch and appreciate what it delivers out-of-box for you.

  • Large scale output jobs which need to run on a timely manner. For example; sales report spanning whole month or even whole year.
  • Import/export handling of data. For example; ETL (Extract-Transform-Load) jobs, data synchronization jobs etc.
  • Various close of business jobs. For example; sales report spanning a day, business level reporting etc.
  • The lack of standard, reusable batch architecture has resulted in the proliferation of many one-off, in-house solutions developed within client enterprise IT functions.
  • – Spring batch documentation (http://static.springsource.org/spring-batch/)

Why a framework?

So why would you need a framework for implementing batch? Why can’t we use a “for loop” for doing such batch jobs? We need to have a framework which addresses not only running a bunch of code in a loop fashion but also to have other features/capabilities/business scenarios as listed below (as detailed in Spring Batch documentation):

  • Commit batch processes periodically; Capability of committing the processed data at times due to various business reasons.
  • Staged, enterprise message-driven processing
  • Concurrent batch processing; parallel processing of a job
  • Massively parallel batch processing
  • Sequential processing of dependent steps with extensions to workflow-driven batches
  • Manual or scheduled restart after failure
  • Whole batch transaction: for cases with a small batch size or existing stored procedure s/scripts
  • Partial processing: skip records e.g. on rollback

Page Visitors: 25939