What Perfmon counters should I monitor and what each of them mean?

| 6 Comments

Off Late, I am not able to update this blog as frequently as I would have liked to as I was some of personal work and lot of professional work.

However most customers ask me this question on what performance counters do we need to monitor in order to troubleshoot performance issues with SQL Server and what each Performance counter mean.

So I thought I would jot down the Perfmon counters which we need to monitor for troubleshooting SQL Server Performance issues.

In this post, I will talk about all the SYSTEM related performance counters and next post I will talk about the SQL Server specific counters.

As SQL Server is an application designed on Windows server, most of the SQL Server performance issue manifest themselves as CPU bottleneck,IO bottleneck, Memory bottleneck or Network Bottleneck.

Hence it is very important to first start diagnosing the System Performance counters to identify resource bottleneck (CPU,IO,Memory or Network) and later on drill into SQL Specific issue

PROCESSOR

Processor(*)% Processor Time

Processor Time is the percentage of elapsed time that the processor spends to execute a non-Idle thread. It is calculated by measuring the duration of the idle thread is active in the sample interval, and subtracting that time from interval duration. (Each processor has an idle thread that consumes cycles when no other threads are ready to run). This counter is the primary indicator of processor activity, and displays the average percentage of busy time observed during the sample interval. It is calculated by monitoring the time that the service is inactive, and subtracting that value from 100%.

We need to check for utilization greater than 60% on each individual processor. If so, determine if it is high user mode CPU (% User Time) or high privileged mode (% Privileged Time). If high privileged mode CPU is suspected, then see the Privileged Mode CPU Analysis. If a user-mode processor bottleneck is suspected, then further check Process% Processor Time counter to identify whether sqlserver.exe is major consumer of the %User Time of the CPU.

If sqlserver.exe is found to be major consumer of CPU, identify TOP CPU queries on the SQL Server with the help of Performance Dashboard Reports; check SQL Compilations & Recompilations on the server to check for excessive recompilations on the server; check for excessive parallelism on the SQL Server.

SystemProcessor Queue Length

Processor Queue Length is the number of threads in the processor queue.  Unlike the disk counters, this counter shows ready threads only, not threads that are running.  There is a single queue for processor time even on computers with multiple processors. Therefore, if a computer has multiple processors, you need to divide this value by the number of processors servicing the workload. A sustained processor queue of less than 10 threads per processor is normally acceptable, dependent of the workload

If there are more tasks ready to run than there are processors, threads queue up. The processor queue is the collection of threads that are ready but not able to be executed by the processor because another active thread is currently executing. A sustained or recurring queue of more threads than number of processors is a good indication of a processor bottleneck.

You can use this counter in conjunction with the Processor% Processor Time counter to determine if your application can benefit from more CPUs. There is a single queue for processor time, even on multiprocessor computers

If the CPU is very busy (90 percent and higher utilization) and the PQL average is consistently higher than the number of processors, then you may have a processor bottleneck that could benefit from additional CPUs. Or, you could reduce the number of threads and queue more at the application level. This will cause less context switching, and less context switching is good for reducing CPU load. The common reason for a high PQL with low CPU utilization is that requests for processor time arrive randomly and threads demand irregular amounts of time from the processor. This means that the processor is not a bottleneck but that it is your threading logic that needs to be improved

Processor(*)% Privileged Time

This counter indicates the percentage of time a thread runs in privileged mode. When your application calls operating system functions (for example to perform file or network I/O or to allocate memory), these operating system functions are executed in privileged mode.

High privileged mode CPU indicates that computer is spending too much time in system I/O versus real (user mode) works. % Privileged Time is the percentage of elapsed time that the process threads spent executing code in privileged mode.  When a Windows system service in called, the service will often run in privileged mode to gain access to system-private data. Such data is protected from access by threads executing in user mode. Calls to the system can be explicit or implicit, such as page faults or interrupts. Unlike some early operating systems, Windows uses process boundaries for subsystem protection in addition to the traditional protection of user and privileged modes. Some work done by Windows on behalf of the application might appear in other subsystem processes in addition to the privileged time in the process

If privileged mode CPU is consuming more than 30% of total CPU. If so, then the CPU consumption is likely caused by another bottleneck such as network, memory, or disk I/O.

SystemContext Switches/sec

As a general rule, context switching rates of less than 5,000 per second per processor are not worth worrying about. If context switching rates exceed 15,000 per second per processor, then there is a constraint.

Context switching happens when a higher priority thread preempts a lower priority thread that is currently running or when a high priority thread blocks. High levels of context switching can occur when many threads share the same priority level. This often indicates that there are too many threads competing for the processors on the system. If you do not see much processor utilization and you see very low levels of context switching, it could indicate that threads are blocked

MEMORY

MemoryAvailable Mbytes

Available MBytes is the amount of physical memory available to processes running on the computer, in Megabytes, rather than bytes as reported in MemoryAvailable Bytes. The Virtual Memory Manager continually adjusts the space used in physical memory and on disk to maintain a minimum number of available bytes for the operating system and processes. When available bytes are plentiful, the Virtual Memory Manager lets the working sets of processes grow, or keeps them stable by removing an old page for each new page added. When available bytes are few, the Virtual Memory Manager must trim the working sets of processes to maintain the minimum required.

If Available Memory on the server is less than 10% of the total Physical Memory on the server it is critical warning and indicates Memory Pressure on the server.

If the Available Memory on the server is very low, check ProcessPrivate Bytes and ProcessWorking Set to identify the User Process consuming the majority of the Memory.

If sqlserver.exe is found to be consuming large portion of the memory. Check Max Server Memory setting of Sql Server and ensure that sql server memory is capped leaving sufficient memory for OS and other application to run on the server.

If Max Server memory is limited and sqlserver.exe is found to be consuming more memory. Use sys.dm_os_memory_clerks to identify memory clerks performing multi page allocation and consuming memory outside Buffer Pool.
If there is no memory clerks found to be consuming large amount of memory outside Buffer Pool, use sys.dm_os_loaded_modules to identify the external modules loaded in sql address space and consuming memory.

MemoryPages/sec

Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of MemoryPages Input/sec and MemoryPages Output/sec. It is counted in numbers of pages, so it can be compared to other counts of pages, such as MemoryPage Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files

This counter should always be below 1000, therefore this analysis checks for values above 1000. Use this analysis in correlation with Available Memory Analysis and Memory Leak Analysis. All are throwing alerts at the same time, then this may indicate the system is running out of memory and the suspected processes involved and follow analysis steps mentioned above.

Process(*)Private Bytes

Private Bytes is the current size, in bytes, of memory that this process has allocated that cannot be shared with other processes. Use this counter and check for a 10MB’s per hour and 5MB’s per hour increasing trends. Use this analysis in correlation with the Available Memory analysis

A process consuming large portions of memory is okay as long as the process returns the memory back to the system. An increasing trend over a long period of time could indicate a memory leak. This counter is used in correlation with Available Mbytes. If you suspect a memory leak condition, then install and use the Debug Diag tool

Process(*)Virtual Bytes

This analysis determines if any of the processes are consuming a large of virtual memory. 32-bit processes by default are only allowed to access up to 2GB’s of user mode memory. If the process becomes close to this maximum, then it could starve for memory. If a process is becoming close to its maximum, then consider moving the process to a 64-bit system. The /3GB switch could be used to give the user mode process a total of 3GB’s of addressable memory, but this takes away 1GB of memory from the kernel which could have worse effects on the entire system such as with Pool Paged Memory and Pool Non-Paged Memory.

Process(*)Working Set

Working Set is the current size, in bytes, of the Working Set of this process. The Working Set is the set of memory pages touched recently by the threads in the process. If free memory in the computer is above a threshold, pages are left in the Working Set of a process even if they are not in use. When free memory falls below a threshold, pages are trimmed from Working Sets. If they are needed they will then be soft-faulted back into the Working Set before leaving main memory.

Use in correlation with Available Memory Analysis.
For SQL Server if more than 50% of working set is trimmed into the page file, an error message is reported in the Error log. In order to prevent workset trimming issue for x64 bit version of Windows Server 2003 we need to assign Lock Pages In Memory Privileges to SQL Server Startup account and enable AWE

MemoryPool Nonpaged Bytes

Watch the value of MemoryPool Nonpaged Bytes for an increase of 10 percent or more from its value at system startup.

Significance: If there is an increase of 10 percent or more from its value at startup, a serious leak is potentially developing

 

ServerPool Nonpaged Failures

Regular nonzero values indicate a bottleneck.

Significance: This counter indicates the number of times allocations from the nonpaged pool have failed. It indicates that the computer’s physical memory is too small. The nonpaged pool contains pages from a process’s virtual address space that are not to be swapped out to the page file on disk, such as a process’ kernel object table. The availability of the nonpaged pool determines how many processes, threads, and other such objects can be created. When allocations from the nonpaged pool fail, this can be due to a memory leak in a process, particularly if processor usage has not increased accordingly.

MemorySystem Cache Resident Bytes

System Cache Resident Bytes is the size, in bytes, of the pageable operating system code in the file system cache. This value includes only current physical pages and does not include any virtual memory pages not currently resident. It does equal the System Cache value shown in Task Manager. As a result, this value may be smaller than the actual amount of virtual memory in use by the file system cache. This value is a component of Memory\System Code Resident Bytes which represents all pageable operating system code that is currently in physical memory. This counter displays the last observed value only; it is not an average.

Under load, a server might use the System Cache in order to cache I/O activity such as disk. Use in correlation with Process IO Data Operations/sec and Process IO Other Operations/sec Analyses.

Reference

File Cache Performance and Tuning

http://technet.microsoft.com/en-us/library/bb742613.aspx

MemoryFree System Page Table Entries

 

Free System Page Table Entries is the number of page table entries not currently in use by the system. Check this counter to determine if the system is running out of free system page table entries (PTEs) by checking if there is less than 10,000 free PTE’s. Lack of enough PTEs can result in system wide hangs. Also note that the /3GB switch will lower the amount of free PTEs significantly.

The Performance Monitor “MemoryFree System Page Table Entries” counter is inaccurate on installations of Windows Server 2003 without Service Pack 1. For more information about this counter, see Microsoft Knowledge Base article 894067

Fix for Win2003 SP1 systems with /3GB and low on PTE’s. If the system is low on PTE’s, running Windows 2003, and using /3GB switch, then consider using the /USERVA switch to give back some of the memory to the kernel. Note, this only works for Free System PTE issues

PHYSICAL DISK

PhysicalDisk(*)Avg. Disk sec/Read

Avg. Disk sec/Read is the average time, in seconds, of a read of data to the disk. This analysis determines if any of the physical disks are responding slowly.
If the response times are greater than 0.015 (15 milliseconds), then the disk subsystem is keeping up with demand, but does not have much overhead left.
If the response times are greater than 0.025 (25 milliseconds), then noticeable slowdowns and performance issues affecting users may be occurring.

PhysicalDisk(*)Avg. Disk sec/Write
Avg. Disk sec/Write is the average time, in seconds, of a write of data to the disk. This analysis determines if any of the physical disks are responding slowly.
If the response times are greater than 0.015 (15 milliseconds), then the disk subsystem is keeping up with demand, but does not have much overhead left.
If the response times are greater than 0.025 (25 milliseconds), then noticeable slowdowns and performance issues affecting users may be occurring.

PhysicalDiskAvg. Disk Queue Length

Threshold: Should not be higher than the number of spindles plus two.
Significance: This counter indicates the average number of both read and writes requests that were queued for the selected disk during the sample interval

Process(*)IO Data Operations/sec
This counter determines the rate at which the process is issuing read and writes I/O operations. This counter counts all I/O activity generated by the process to include file, network and device I/Os

If the Disk Response Time is poor identify the Processes incurring lot of I/O using the counter ProcessIO Data Operations/sec and ProcessIO Other Operations/sec

If sqlserver.exe is identified to major consumer of IO, Identify the top IO consuming queries using Performance Dashboard and tuning the queries by building appropriate Indexes

Process(*)IO Other Operations/sec
This counter determines the rate at which the process is issuing I/O operations that are neither read nor write operations (for example, a control function). This counter counts all I/O activity generated by the process to include file, network and device I/Os

NETWORK IO

To measure network I/O, you can use the following counters:

Network InterfaceBytes Total/sec

Threshold: Sustained values of more than 80 percent of network bandwidth.

Significance: This counter indicates the rate at which bytes are sent and received over each network adapter. This counter helps you know whether the traffic at your network adapter is saturated and if you need to add another network adapter. How quickly you can identify a problem depends on the type of network you have as well as whether you share bandwidth with other applications.

Network InterfaceBytes Received/sec

This counter indicates the rate at which bytes are received over each network adapter. You can calculate the rate of incoming data as a part of total bandwidth. This will help you know that you need to optimize on the incoming data from the client or that you need to add another network adapter to handle the incoming traffic.

Network InterfaceBytes Sent/sec

This counter indicates the rate at which bytes are sent over each network adapter. You can calculate the rate of incoming data as a part of total bandwidth. This will help you know that you need to optimize on the data being sent to the client or you need to add another network adapter to handle the outbound traffic.

ServerBytes Total/sec

This value should not be more than 50 percent of network capacity.

This counter indicates the number of bytes sent and received over the network. Higher values indicate network bandwidth as the bottleneck. If the sum of Bytes Total/sec for all servers is roughly equal to the maximum transfer rates of your network, you may need to segment the network.

Processor% Interrupt Time

This counter indicates the percentage of time the processor spends receiving and servicing hardware interrupts. This value is an indirect indicator of the activity of devices that generate interrupts, such as network adapters.

Network Interface(*)Output Queue Length

This counter checks to see how many threads are waiting on the network adapter. If there are a lot of threads waiting on the network adapter, then the system is most likely saturating the network I/O most likely due to network latency or network bandwidth.

Output Queue Length is the length of the output packet queue (in packets). If this is longer than two, there are delays and the bottleneck should be found and eliminated, if possible. Since the requests are queued by the Network Driver Interface Specification (NDIS) in this implementation, this will always be 0.

 

Hope this helps Smile

Parikshit Savjani
Premier Field Engineer,India

6 Comments

  1. Hi! I really love reading your articles.. Really help me a lot. It makes me realize how little have I learned..

    Keep on giving us some good stuff. :-)

    really an avid fan of yours.. :-)

  2. Hi there.

    Just a question about:
    “This value is a component of Memory\System Code Resident Bytes which represents all pageable operating system code that is currently in physical memory.”¨

    If System Cache Resident Bytes value is part of System Code Resident Bytes, how is it posible, that the second one is always smaller?

    Thanks

  3. That is very good information. Thanks in the name of ElPlatanal website.

  4. Good article.. Nice work!!!

  5. Excellent …If this would been in table format
    I would have print and posted on my desk

Leave a Reply

Required fields are marked *.