Batch Processing ((INSTALL))
Computerized batch processing is a method of running software programs called jobs in batches automatically. While users are required to submit the jobs, no other interaction by the user is required to process the batch. Batches may automatically be run at scheduled times as well as being run contingent on the availability of computer resources.
batch processing
The term "batch processing" originates in the traditional classification of methods of production as job production (one-off production), batch production (production of a "batch" of multiple items at once, one stage at a time), and flow production (mass production, all stages in process at once).
As computers became faster the setup and takedown time became a larger percentage of available computer time. Programs called monitors, the forerunners of operating systems, were developed which could process a series, or "batch", of programs, often from magnetic tape prepared offline. The monitor would be loaded into the computer and run the first job of the batch. At the end of the job it would regain control and load and run the next until the batch was complete. Often the output of the batch would be written to magnetic tape and printed or punched offline. Examples of monitors were IBM's Fortran Monitor System, SOS (Share Operating System),and finally IBSYS for IBM's 709x systems in 1960.[1][2]
The first general purpose time sharing system, Compatible Time-Sharing System (CTSS), was compatible with batch processing. This facilitated transitioning from batch processing to interactive computing.[5]
From the late 1960s onwards, interactive computing such as via text-based computer terminal interfaces (as in Unix shells or read-eval-print loops), and later graphical user interfaces became common. Non-interactive computation, both one-off jobs such as compilation, and processing of multiple items in batches, became retrospectively referred to as batch processing, and the term batch job (in early use often "batch of jobs") became common. Early use is particularly found at the University of Michigan, around the Michigan Terminal System (MTS).[6]
Non-interactive computation remains pervasive in computing, both for general data processing and for system "housekeeping" tasks (using system software). A high-level program (executing multiple programs, with some additional "glue" logic) is today most often called a script, and written in scripting languages, particularly shell scripts for system tasks; in IBM PC DOS and MS-DOS this is instead known as a batch file. That includes UNIX-based computers, Microsoft Windows, macOS (whose foundation is the BSD Unix kernel), and even smartphones. A running script, particularly one executed from an interactive login session, is often known as a job, but that term is used very ambiguously.
"There is no direct counterpart to z/OS batch processing in PC or UNIX systems. Batch jobs are typically executed at a scheduled time or on an as-needed basis. Perhaps the closest comparison is with processes run by an AT or CRON command in UNIX, although the differences are significant."[7]
Batch applications are still critical in most organizations in large part because many common business processes are amenable to batch processing. While online systems can also function when manual intervention is not desired, they are not typically optimized to perform high-volume, repetitive tasks. Therefore, even new systems usually contain one or more batch applications for updating information at the end of the day, generating reports, printing documents, and other non-interactive tasks that must complete reliably within certain business deadlines.
Some applications are amenable to flow processing, namely those that only need data from a single input at once (not totals, for instance): start the next step for each input as it completes the previous step. In this case flow processing lowers latency for individual inputs, allowing them to be completed without waiting for the entire batch to finish. However, many applications require data from all records, notably computations such as totals. In this case the entire batch must be completed before one has a usable result: partial results are not usable.
Modern batch applications make use of modern batch frameworks such as Jem The Bee, Spring Batch[8] or implementations of JSR 352[9] written for Java, and other frameworks for other programming languages, to provide the fault tolerance and scalability required for high-volume processing. In order to ensure high-speed processing, batch applications are often integrated with grid computing solutions to partition a batch job over a large number of processors, although there are significant programming challenges in doing so. High volume batch processing places particularly heavy demands on system and application architectures as well. Architectures that feature strong input/output performance and vertical scalability, including modern mainframe computers, tend to provide better batch performance than alternatives.
A bank's end-of-day (EOD) jobs require the concept of cutover, where transaction and data are cut off for a particular day's batch activity ("deposits after 3 PM will be processed the next day").
As requirements for online systems uptime expanded to support globalization, the Internet, and other business needs, the batch window shrank[12][13] and increasing emphasis was placed on techniques that would require online data to be available for a maximum amount of time.
The IBM mainframe z/OS operating system or platform has arguably the most highly refined and evolved set of batch processing facilities owing to its origins, long history, and continuing evolution. Today such systems commonly support hundreds or even thousands of concurrent online and batch tasks within a single operating system image. Technologies that aid concurrent batch and online processing include Job Control Language (JCL), scripting languages such as REXX, Job Entry Subsystem (JES2 and JES3), Workload Manager (WLM), Automatic Restart Manager (ARM), Resource Recovery Services (RRS), IBM Db2 data sharing, Parallel Sysplex, unique performance optimizations such as HiperDispatch, I/O channel architecture, and several others.
The Unix programs cron, at, and batch (today batch is a variant of at) allow for complex scheduling of jobs. Windows has a job scheduler. Most high-performance computing clusters use batch processing to maximize cluster usage.[14]
Batch processing is the method computers use to periodically complete high-volume, repetitive data jobs. Certain data processing tasks, such as backups, filtering, and sorting, can be compute intensive and inefficient to run on individual data transactions. Instead, data systems process such tasks in batches, often in off-peak times when computing resources are more commonly available, such as at the end of the day or overnight. For example, consider an ecommerce system that receives orders throughout the day. Instead of processing every order as it occurs, the system might collect all orders at the end of each day and share them in one batch with the order fulfillment team.
Organizations use batch processing because it requires minimal human interaction and makes repetitive tasks more efficient to run. You can set up batches of jobs composed of millions of records to be worked through together when compute power is most readily available, putting less stress on your systems. Modern batch processing also requires minimal human supervision or management. If there is an issue, the system automatically notifies the concerned team to solve it. Managers take a hands-off approach, trusting their batch processing software to do its job. More benefits of batch processing follow.
Financial services organizations, from agile financial technologies to legacy enterprises, have been using batch processing in areas such as high performance computing for risk management, end-of-day transaction processing, and fraud surveillance. They use batch processing to minimize human error, increase speed and accuracy, and reduce costs with automation.
Enterprises that deliver software as a service (SaaS) applications often run into issues when it comes to scalability. Using batch processing, you can scale customer demand while automating job scheduling. Creating containerized application environments to scale demand for high-volume processing is a project that can take months or even years to complete, but batch processing systems help you achieve the same result in a much shorter timeframe.
While batch processing applications vary depending on the type of task that needs to be done, the basics of any batch job remain the same. The user can run batch jobs by specifying the following details:
During the batch window, the batch processing system uses the batch size information to allocate the resources needed to run the batch job efficiently. Modern systems can run hundreds of thousands of batch jobs on premises or in the cloud.
Batch job tasks can run sequentially or simultaneously. Sequences can differ depending on whether an earlier task is completed successfully. Examples of dependencies include a customer making an order in an online store or paying a bill. A dependency can also be set up to initiate a job processing cycle.
Monitors in batch processes look for abnormalities, such as a job taking longer than it should to complete. In this instance, it would stop the next job from beginning and inform the relevant staff of the exception.
Whereas batch systems process large volumes of data and requests in sequential order, stream processing continually analyzes data that flows through a system or between devices. Stream processing monitors real-time data and continually passes it on in the network. It requires more processing power to monitor the large amounts of data.
When the size of data being streamed is not known or infinite, streaming data can be preferable to batch processing. As a result, stream processing is commonly used for business functions such as cybersecurity, Internet of Things (IoT), personalized marketing services, and log monitoring. 041b061a72