Introduction to Enterprise Job Scheduling:
A Crash Course in Cutting IT Ops Costs

by Elizabeth Ferrarini

Since 2001 in every industry, IT has come under intense pressure to make organizations perform more efficiently while still contributing to the bottom line. Nowhere is this more apparent in financial services where batch job scheduling has become the critical component to IT success.

advertisement
When it comes to an analysis of the industry issues behind this data, most industry experts agree that there are three major trends are re-shaping the information infrastructures of financial institutions today: Globalization, Automation & Service Levels. These three trends serve as a stark reminder of the challenges that financial organizations face today as they make the transition to a global, web-based economy. Learn more by downloading the Business Integrated Scheduling for Banking and Finance white paper from BMC Software

The U.S. Securities and Exchange Commission has asked that all stock trades be cleared on what's called the trade day plus one or T+1 by June 2005. This requirement will force a switch from Wall Street's traditional batch processing systems to a real-time processing network that never crashes, according to a Computerworld article. The article adds that while upgrading to comply with T+1 will cost about $8 billion, the financial services industry will see savings of about $2.7 billion a year. In addition, this industry will have lower costs, lower error rates, and higher productivity while graining the ability to handle greater transaction volume.

Based on the ROI figures for the financial services industry, it comes clear that there are significant monetary benefits to be gained from implementing an automated job scheduling solution. It also becomes clear that beyond the direct addition to the bottom line through cost savings there are benefits of freeing up systems resources allowing them to be used more productively (e.g. talented human resources can be put to better use on more important IT projects).

Taken together, automating job scheduling on the surface can offer significant benefits to enterprises of every size. Since not all job schedulers are created equal and don't yield the same benefits, you need to understand how the different types of schedulers work and what attributes to look for in an automated job scheduler. This article will provide you with a crash course on the subject.

How Job Schedulers Work

Job scheduling comprises one of the most important components in a production-computing environment. Job schedulers do many things. They initiate and help manage long, complex jobs, such as payroll runs and inventory reports. They also launch and monitor applications.

Most computer environments use some kind of job scheduler. With the large distributed computing environments, some job schedulers have not scaled to meet the challenges of enterprise computing. Mainframe schedulers enjoy a reputation for power and robustness, but can be limited to working on mainframes. Unix schedulers, on the other hand, have a reputation for being severely limited in functions, but have cross-platform abilities which mainframe schedulers lack.

When beginning to manage batch workloads in open systems environments, most companies launch their first jobs using manual methods. This technique is understandable and appropriate. However, this technique quickly breaks down when the number of machines and batch jobs increases.

For example, Unix and NT systems provide job launchers. These native tools allow users to launch jobs at specific times and specific dates. These commands provide a basis for scheduling, yet on their own do not deliver a solution for complex scheduling requirements. They rely on operators manually submitting jobs from a workstation. This technique is costly, and potentially unreliable and error prone.

In distributed systems, the job launchers in Unix and NT systems provide simple job launching capability. They offer the ability to start a batch job at a specific time, based upon an adequate set of time and date matching criteria. They perform simple job scheduling tasks such as kicking off a backup every Saturday.

The biggest weakness of these native tools is their inability to monitor and to correlate the execution of one job with the results of another. If a backup job fails, these tools don't know it should suspend the jobs that update the tape catalogs or deletes yesterday's old files. If the backup finishes early, these tools can't move up jobs that are to be executed upon completion of the backup.

Also, these native tools can only start jobs that are time-dependent. This procedure makes it difficult to create a job that runs when a file disappears or when a system resource has a certain threshold.

Job launching of configuration files are difficult to maintain. Even minor changes to a job's start time are time consuming and error prone. And there are no layered tools to make job creation easier. Remember, these tools are simple job launching tools designed for low volume environments. They lack all critical features required for complex, large systems.

To make up for this deficiency, many systems administrators create their own job management system. They use these native tools to initiate a job controller and create scripts that detect failure conditions, initiate other jobs, and provide some degree of checkpoint and restart capabilities.

While these solutions often work adequately for small job streams, they rarely scale to handle job loads of complex network environments. They also lack sophisticated user interfaces and reporting tools that allow users to keep audit trails of job streams.

More importantly, home-grown job schedulers quickly turn into full-time programming commitments. As dependence increases on the tool, more and more features get added. The result is usually a varied mix of scripts, programs, and Unix utilities that only a few people actually understand. This causes a situation prone to problems.

Mainframe job scheduling is the complete opposite of Unix job scheduling. Mainframe tools provide robust scheduling capabilities that handle huge, complex job streams with ease. Mainframe schedulers group jobs into collections, treating the collection as a single entity whose execution, success, or failure can be tracked and used to trigger other jobs or collections of jobs. Users start jobs and job collection using time triggers or other criteria, such as creation of a file, mounting a tape, or the shutdown of a database. The job scheduler is aware of almost all activity within the system and can respond accordingly.

Using screen-oriented user interfaces, system operators can track the status of jobs, noting which are running long and which are completing. Using this interface, operators can suspend jobs, delay execution, restart jobs, and track schedule slippage. It's possible to alert an operator if a job exceeds a maximum run time, or if a job failed to start due to not met execution criteria.

Mainframe schedulers also offer good reporting tools. They create execution logs and report job failure and success. Analyzing these reports over a period of time lets users see trends, such as accounting job streams that take longer and longer to backup jobs that begin to press against the limits of back windows.

What to Look For in an Automated Job Scheduler

With the increase in jobs in all businesses and the need to have these jobs run more quickly, it makes sense and pays dividends to automate job scheduling. Automating job scheduling yields several tangible benefits:

· Reduces personnel costs while freeing up those human resources for more important and more profitable projects
· Launches jobs on time, thus improving efficiency and reduces potential for human error
· Optimizes resources allowing more work to be accomplished. A properly functioning job scheduling solution also allows new resources to be added or existing resources to be reconfigured with minimal impact on IT operations.

advertisement
When it comes to an analysis of the industry issues behind this data, most industry experts agree that there are three major trends are re-shaping the information infrastructures of financial institutions today: Globalization, Automation & Service Levels. These three trends serve as a stark reminder of the challenges that financial organizations face today as they make the transition to a global, web-based economy. Learn more by downloading the Business Integrated Scheduling for Banking and Finance white paper from BMC Software

Whether it's an NT/2000, Unix, mainframe, or something else, there are specific capabilities a good automated job scheduler should have.

A good scheduler supports non-temporal job triggers such as file creation of system alerts. Users must be able to suspend job stream, slip a schedule to another time of day, and cancel a single instance of a job without affecting its overall schedule. There should be no limit to the number of jobs that can be created, and the system should be easy to use with 10 jobs as it is with 10,000 jobs.

And the job scheduler should be not only a technical asset, but a business asset. It should reduce costs, increase productivity, and maximize efficiency so that IT can fulfill its mission of adding value to the business.

Several computing job scheduling architectures have emerged for heterogeneous, distributed environments: collaborative; master and agent; and variations of master and agent which include master, submaster, agent, and console, master and agent. Because there are many similarities between master and agent and its variations, one need complete the collaboraton with the master agent architectures.

Master and Agent Architecture

The traditional architecture for job scheduling solutions is the master and agent architectures. Schedulers using this model generally evolved from mainframe concepts. This architecture involved putting a full implementation of the job schedulers on one server, the master, and putting agents on a series of other series, the agents.

In the master and agent configuration, jobs are set up, scheduled and administered from the master server. The actual work is done on the agents. The agents communicate with the master throughout the job run as the master passes parameters and other critical data to the agent. Jobs might be partitioned among agents. As the job is passed from server to server, communications must be maintained between agents and master. This makes network availability critical to successful completion of jobs.

On the one hand, the master and agent central administration allows tight control over jobs. This benefit comes at the cost of central, top-down, rather than inflexible tree structure. On the other hand, the most significant limitation of master and agent systems is the requirement for the master and agents to remain in sync. When the network or central server is interrupted, how long will it take to reconstruct your activity? The well-known volatility of distributed networks is an important consideration when considering schedulers based on master/agent architecture.

A second area of concern is performance. In master and agent environments, communication continually flows between the master and each of the agents. As the work workload increases so does the network traffic. As the traffic increases, the potential for overload expands.

Another aspect to consider is scalability. A master can only support a limited number of agents, and this depends on the number of jobs to be run. Creating a new master or instance creates a new and separate administration. The more instances you create, the more management you need. When you create a new instance, you need to recreate all jobs. The process can take days, weeks, or even months. The process itself can lead to errors and failures at any point along the way. While the new instances can be managed by the same administrator, within reason, the inability to administer the entire job scheduling environment from a single point increases complexity, and the likelihood of confusion and errors.

This lack of scalability can affect your overall costs drastically. When you create a new master, you need to add new hardware at the master and agent levels. In a large enterprise, this could quickly grow to a $1 million problem.

Collaborative Architecture

Designed for distributed environments, the collaborative architecture leverages the combined computing power of networks. In collaborative architecture environments, a full copy of the job scheduler is carried out on every server on the network. With this technique, once a server is given parameters for a job, it can run independently.

Each server runs jobs independently of all others. Communication occurs for coordination and updates. It effectively uses network resources to combine mainframe-like robustness with distributed flexibility.

Administration in collaborative environments is flexible. You can manage your job scheduling from either a central point or at the local level.

Since the collaborative architecture was designed for distributed environments, it has many benefits. With a full working copy of the software on every server, network downtime has diminished affect. Jobs continue to run even during network outages. The same applies to individual servers. If one server crashes, all other servers in the network continue their jobs. Any interdependent jobs are held until the crashed server resumes activity.

Since jobs can run locally, network communications and overhead decrease. This decrease translates into improved network and system performance.

In a collaborative environment scaling can be limited to the size of your network. Some job schedulers might be able to handle 500 servers each running 1,000 jobs for a total of 500,000 jobs. Replicating jobs is straightforward. Based on logical views of jobs and the environment, even the most complex jobs can be replicated in minutes.

Another distinct advantage is the most efficient use of hardware resources. Typically, in a collaborative architecture, your total job scheduling overhead is about one percent of central processing unit resources on each server in the network. In master and agent profiles, you need a dedicated server for the job scheduler itself plus a backup server in case the master fails. This feature is in addition to resources used on each server. Because of the limits on scalability, each time you expand to a new master configuration, you need to add hardware and software for the job scheduling server.

CIO - Measure Your ROI!

The pressure on IT to produce promised savings and efficiencies from new technologies they implement will only increase. In an era of fiscal belt tightening, these pressures increase even more. To this end, automated job scheduling can alleviate some of these pressures while adding value to the business. The time has come to measure that value in terms of return on investment.


home

sponsored by
BMC Software
Remedy

advertisement

Are you losing visitors ($) because your server is down or performing poorly?

How well is your infrastructure performing?

Do you need a solution "up-and-running" today?

Award winning PATROL Express monitors the performance and availability of:

  • servers
  • applications
  • storage devices
  • network devices

PATROL Express also monitors the performance and availability of web transactions.

Monitoring is accomplished remotely (agentless) with no software residing on the elements being monitored.

PATROL Express:

  • drives down operating costs
  • measures customers’ true end-to-end Web site experiences
  • helps improve service levels
  • offers centralized access to reports via a the web
  • reports against user-defined service level objectives

"Enterprises looking for a simple way to deploy a management product, either as an enterprisewide solution or as one limited to remote divisions, must look at BMC Patrol Express...
Jeane Pierre Garbani
- Giga Information Group

PATROL Express augments Corio's world-class monitoring infrastructure by providing quick delivery of a cost-effective monitoring solution for Corio customers. Corio has achieved a 35 percent cost savings using PATROL Express..."
- Noahal Mundt
Senior Architect, Corio

"PATROL Express meets our criteria: it is fast to install, scalable and easy to operate. It delivers the management features we need and uses an agentless architecture."
- Lau Soon Liang
Assistant CEO, National Computer Systems Pte. Ltd.

Try it NOW>>

 

 

 

 


Copyright (c) 2000-2003, nextslm.org. All Rights Reserved. Legal Statement.