Coldfusion Feature - Wait-time Analysis Method

A new best practice for application and databaseWait-Time Analysis for Service Level Management
performance managementBecause Wait-Time analysis measures the collective
Source: Until recently, tuning IT application performancetime delays causing end users to wait for an
has been largely a guessing game. This is bothinformation request, it's the measurement technique
surprising and unacceptable considering the relentlessmost closely matched to end-user service levels. For
focus IT organizations put on cost-efficiency andorganizations focused on Service Level Management
productivity.(SLM) techniques, or those bound by Service Level
The traditional approaches to database and applicationAgreements (SLAs), Wait-Time analysis techniques
tuning that involve collecting large volumes of statisticsallow the IT department to measure the performance
and making trial-and-error changes are still inthat is most relevant to achieving the stated service
widespread use. Today, most server managementlevel goals. Service level management typically
and monitoring tools deliver "server-oriented" statisticsidentifies technical metrics that define whether
that don't translate to concrete end-user benefits.performance is adequate, and Wait-Time data is the
The landscape is changing, however. The currentbasis for evaluating those metrics.
thinking of leading consultants, DBAs, and trainingThe Problem with Conventional Statistics
organizations is focusing on performance tuningThere are so many management tools gathering
practices that are tied directly to end-user servicethousands of statistics from IT systems. Don't these
levels and improvements in operating efficiency.provide the same answer as Wait-Time methods?
Wait-Time analysis is a new approach to applicationWhy are they not effective?
and database performance improvement that allowsTraditional approaches to database tuning and
users to make tuning decisions based on the optimalperformance analysis introduce the same errors
service impact. Using the principles of Wait-Timeidentified in the driving example above.
analysis described here, DBAs, developers, and1. Event Counters versus Wait-Time Methods
application owners can align their efforts with theTypical tools count the number of events, but don't
service levels desired by their IT customers. Wait-Timemeasure time. These statistics are numerous and
analysis lets IT find the root cause of the mosteasy to capture, so they tend to flood management
important problem impacting customers and identifydashboards. But, are they useful?
which critical resource will resolve it.Broad management dashboards have sophisticated
What Is Wait-Time Analysis?displays of monitored data, but counting events or
Measure Timecalculating ratios doesn't indicate or predict better
If you were trying to shorten your commute to work,performance for database customers. In fact, this
what would you measure? Would you count theapproach can have the effect of covering up, rather
number of tire rotations? Would you measure the car'sthan exposing, the real service level bottlenecks.
temperature? Would these statistics have anyThe example is an excerpt from a long summary of
meaning in the context of your goal? All that reallycounted statistics. Clearly there's much detail and
matters is what impacts the time for your trip. All thetechnical accuracy. But where would you go to begin
other statistics are distractions that don't help youryour diagnosis? Do these raw numbers reveal a
mission. Wait-Time analysis gets to the root of theperformance problem? Is the value for "physical writes
problem to achieve the end business result. Althoughdirect" in the table too high or too low? There's no
this seems obvious, common IT practices suggest thatindication of impact on the end-user service level to
other practices hold the answer. Rather thanmake that judgment.
immediately focusing on the time to completeOn the other hand, ranks individual SQL requests by
requested services, IT tools barrage the user withWait-Time. The statement with the highest Wait-Time
detailed statistics that count the number of manyis at the top of the list. Its relative impact on overall
different operations. So while the DBA should really beuser service is reflected in the length of the bar -
looking at how long it took for the database to returnmeasuring how much time users experience waiting
the results of a query, typical tools display the numberon this request. Without counting how many times an
of input/output (I/O) operations and locks encountered.operation occurred, this is a much more meaningful
Get the Detailsmeasure of end-user service.
Under the trial-and-error approach, what level of detail2. System-Wide Averages
do you need to actively improve your commute time?Typically statistics are gathered across an entire
If the only statistic you have is that the trip took 40system, rather than on a basis that applies to an
minutes, you can compare one day to the next, butindividual user request. When averaging performance
there's not enough data to help improve the situation.across all requests, it becomes impossible to tell which
What you need is detailed insight into how long yourequests are the most critical resource drains and
spent at each stoplight, which stretches of road havewhich resources are impacting service levels.
the most stop-and-go traffic and how long you waitedVendor-supplied database tools, for example, typically
there. This detail is essential to making the exercisedisplay data across the entire database without
useful.breaking it down into specific user requests. As a
The same concept applies to IT performanceresult, there's no indication which end-user functions
systems. When Wait-Time is typically measured, awere impacted.
"black box" approach is taken, where the user sees3. Silos versus End-to-End Analysis
how long a server took to respond to a request.Another key problem with typical IT monitoring tools is
However, no indication is given as to which of thethe creation of individual information "silos" that localize
thousands of steps performed by the server werestatistics for a single type of system, but don't expose
actually responsible for the delay. As will be shownan end-user's view of performance.
here, it's important not just to measure Wait-Time butBecause of the differing technical skill sets, separate
to break it down into sufficient detail so that you cangroups manage databases, application servers, and
take action.Web infrastructure. Each group has a primary focus -
Wait-Time analysis for IT applications is the singularto optimize the performance of their box. And typically
focus of measuring and improving the service time tothey use the most common and convenient statistics
the IT customers. By identifying exactly whatto measure and improve performance. For an
contributes to longer service time, IT professionals canapplication server, this often means watching memory
focus not on the thousands of available statistics, bututilization, thread counts, and CPU utilization. For a
on the most important bottlenecks that have direct anddatabase, this is a count of the number of sessions,
quantifiable impact on the IT customer.number of reads, or number of processes.