Service Level Management: The Big Picture

by Rick Sturm

Service Level Management (SLM): To most people in the industry, the term refers to a very narrow, finite space — service level agreements, metrics, data capture, reporting, and so on. The products that IT departments and service providers purchase for SLM are predominantly in the area of performance management and reporting, plus tools to measure availability. Although those are not the only tools, they are the dominant ones — the ones that dominate the thoughts of most service providers. The ones that reflect their thinking about SLM.

However, SLM is much broader, encompassing everything involved in delivering a service at acceptable levels. When the subject is addressed in this way, most of us will be quick to concur. Yet even then, we will still tend to focus our thinking primarily on the issue of performance management.

Infrastructure Behind the Scenes

It is true that performance management is an important part of SLM. It is the part that ensures that expected levels of service are provided, or even exceeded. Performance management is the piece of the puzzle that provides for trouble-shooting performance problems, as well as for continuous improvement of service levels.

Similarly, fault management is a key component. Through fault management, problems are detected and addressed. In some cases, those problems may impact availability, while others will degrade performance. However, both impact the level of service being provided. Unfortunately, these components are also reactive in nature.

(And yes, I do realize that some products monitor performance in real-time and make extrapolations, issuing warnings when there is a chance of violating a service level guarantee. However, I would argue that even that function is, in essence, reactive.)

Fault and performance tend to grab the headlines in our mind. They represent the issues that we are concerned with on a daily basis. “What else is there?” you may ask. The answer, in terms of infrastructure, is simply, “everything.”
If you are going to provide any kind of service, you must first “provision” it. That is, you must create the system of hardware and software that is necessary for you to be able to provide the service.
If the equipment and software selected is not appropriate for the service, there is no hope of being able to deliver adequate levels of service. There must be facilities to house the equipment. The service must be configured to support the customer or internal user. If the configuration changes are not made, then the user is not going to be served.

Don’t Wait for a Disaster

However, the list of areas frequently neglected does not stop there. In the wake of the tragedy of September 11, 2000, it is clear that backup and recovery are more important than ever before.

Having thorough and tested disaster recovery plans, including an alternate site available in the event of a disaster, has long been viewed by executives as expensive luxuries. Executives look at the probability of an actual disaster occurring and compare that with the cost. Far too often, they would decide that the danger represents an acceptable degree of risk.

A building may not be obliterated, but thousands of small “disasters” occur every day. Some merely require restoring a corrupted database. In other cases, there may be actual damage to facilities or equipment. The notorious fiber-seeking backhoe may have just severed the only connectivity between your facility and your users. What are you going to do? And there is always the bizarre event. Some employee with a sick sense of humor starts an anthrax scare by sending a “contaminated” letter to a co-worker. The police are called, and your building is evacuated for two days of testing.

I once worked in a highrise building in the downtown area of a major city. That building had to be evacuated when it was discovered that a small natural gas leak from a pipe under the street had filled a sub-basement with an explosive level of natural gas.

Tornados, floods, earthquakes, power blackouts … the list of possible causes of disruption of service is seemingly endless. If you aren’t well prepared to deal with the consequences of these disruptions, you are not prepared to deliver the service.

Security in an Insecure World

Another neglected area of SLM is security. The most likely single cause of a disruption or degradation in service is the well-intentioned but inept employee. You say that you have only the finest, error-free employees? Fine. What about the employees who are not well intentioned? Those people who, in the face of layoffs or out of pure malevolence, deliberately set out to damage your facility or disrupt the service? It happens every day and usually goes unreported.

Then there the hordes of “crackers” who try to bring your service to its knees. It won’t happen to you? Don’t bet on it! The choice of targets often seems random. Being small in size or profile does not provide assurance of safety. “Denial of Service” attacks happen all too frequently and, like other security issues, usually go unreported. Then, of course, there are the viruses. These are becoming increasingly sophisticated and destructive. Not only can viruses disrupt your service; they can also destroy your firm’s relationship with users by causing sensitive information to be released.

Like backup and recovery, if you aren’t deadly serious about security, then you are not prepared to deliver the service you are guaranteeing.

If you’re a user contemplating a contract with a service provider (even an in-house service provider, such as IT), in the course of your due diligence, you must address how adequate are their security measures and disaster recovery plans. Of course, if you are like most users, you won’t bother with due diligence and will instead rely on the representations made by the sales rep for the service provider. (If this is the case, I want to be your service provider, and I also have some Florida swampland to sell you. I am assured that it is totally free of snakes, mosquitoes and alligators.)

--

Rick Sturm is the founder and president of Enterprise Management Associates, the first technology analyst firm to specialize exclusively in management software and services.

home

sponsored by
BMC Software
Remedy

advertisement

Are you losing visitors ($) because your server is down or performing poorly?

How well is your infrastructure performing?

Do you need a solution "up-and-running" today?

Award winning PATROL Express monitors the performance and availability of:

  • servers
  • applications
  • storage devices
  • network devices

PATROL Express also monitors the performance and availability of web transactions.

Monitoring is accomplished remotely (agentless) with no software residing on the elements being monitored.

PATROL Express:

  • drives down operating costs
  • measures customers’ true end-to-end Web site experiences
  • helps improve service levels
  • offers centralized access to reports via a the web
  • reports against user-defined service level objectives

"Enterprises looking for a simple way to deploy a management product, either as an enterprisewide solution or as one limited to remote divisions, must look at BMC Patrol Express...
Jeane Pierre Garbani
- Giga Information Group

PATROL Express augments Corio's world-class monitoring infrastructure by providing quick delivery of a cost-effective monitoring solution for Corio customers. Corio has achieved a 35 percent cost savings using PATROL Express..."
- Noahal Mundt
Senior Architect, Corio

"PATROL Express meets our criteria: it is fast to install, scalable and easy to operate. It delivers the management features we need and uses an agentless architecture."
- Lau Soon Liang
Assistant CEO, National Computer Systems Pte. Ltd.

Try it NOW>>

 

 

 

 


Copyright (c) 2004-2005, nextslm.org. All Rights Reserved. Legal Statement.