Recently we encountered a case where in a sql server which was running and hosting Biztalk instances was showing 100% CPU.

 

We started troubleshooting from the SQL end to identify who is major consumer of CPU and following were our observations and findings

 

·         We used the perfmon counter Process  %Processor time and confirmed that sqlservr.exe was the major consumer of the CPU.

·         Since the recommendation for Biztalk server is to have MAXDOP 1 we ruled out parallelism as a cause of HIGH CPU.

·         We checked the perfmon counter Sql Server: SQL Statistics , Batch Requests/sec , Sql Compilations/sec and SQL Recompilations/sec but we could not see any major compilations or recompilations/sec so even that was ruled as a cause of the High CPU

·         Finally we started looking at the queries which are major consumers of CPU. We used the following queries and queried the DMvs sys.dm_exec_requests and sys.dm_exec_query_stats to find the queries consuming high CPU

 

SELECT st.text, r.cpu_time  FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS st
ORDER BY cpu_time DESC

SELECT TOP 5 total_worker_time/execution_count AS [Avg CPU Time],

    SUBSTRING(st.text, (qs.statement_start_offset/2)+1,

        ((CASE qs.statement_end_offset

          WHEN -1 THEN DATALENGTH(st.text)

         ELSE qs.statement_end_offset

         END – qs.statement_start_offset)/2) + 1) AS statement_text

FROM sys.dm_exec_query_stats AS qs

CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS st
ORDER BY total_worker_time/execution_count DESC;

 

 

·         We found that following two stored procedures were one of the major consumers of CPU resources while all the other queries on the servers were getting blocked and waiting on SOS_SCHEDULER_YIELD

 

MSGBOXPERFCOUNTERS_GETSPOOLSIZE                            11.71%

[DBO].[BTS_DEQUEUEMESSAGES_BIZTALKSERVERISOLATEDHOST]     10.59%

 

 

·         We consulted our Biztalk team and found that the first stored proc is used for performance data collection which is fired every 5 secs while second stored procedure is used to dequeue messages coming from Biztalk

 

·         We further drilled the stored procedure [DBO].[BTS_DEQUEUEMESSAGES_BIZTALKSERVERISOLATEDHOST]  and found it was using a cursor to query table

 [BizTalkServerIsolatedHostQ] which was causing blocking on the server.

 

DECLARE btscurse CURSOR FAST_FORWARD FOR  

SELECT TOP 20 w.uidWorkID, w.uidInstanceID, w.uidMessageID, w.snPartRetrieval,

w.fOrderedDelivery, w.snPriority, w.dtLastTouched, w.nID, i.uidProcessID,

i.uidActivationID, i.nState   FROM [BizTalkServerIsolatedHostQ] AS w

WITH (UPDLOCK ROWLOCK READPAST)   INNER LOOP JOIN Instances AS i

WITH (UPDLOCK ROWLOCK READPAST) ON ( w.uidInstanceID = i.uidInstanceID AND uidAppOwnerID = N‘85459549-B70C-4525-B1E6-72D3C0125E52’) AND ( i.uidProcessID IS NULL OR i.uidProcessID = @uidProcessID ) 

 

 

·         We queried the table BizTalkServerIsolatedHostQ and found that there were 581683 messages which were undelivered which were unusual. SO it indicates some of the Orphaned records might be cached in the table which are zombie

 

·         On further research we found that we might hitting a bug mentioned in the KB articles KB 944426 & 936536 & 941690.

 

 

In order to resolve the issue  we used the following Action Plan

 

·         In order to minimize the SQL CPU usage due to the stored procedure MSGBOXPERFCOUNTERS_GETSPOOLSIZE , we set the following registry key to 30sec   

 

HKLMSoftwareMicrosoftBizTalk Server3.0MsgBox Performance CacheRefreshInterval to 30 seconds

 

IMPORTANT : This resolution contains information about modifying the registry. Before you modify the registry, make sure to back it up and make sure that you understand how to restore the registry if a problem occurs. For information about how to back up, restore, and edit the registry, click the following article number to view the article in the Microsoft Knowledge Base:

256986 Description of the Microsoft Windows Registry

 

·         We terminated the orphan cache and RFR and applied the Hotfix  KB 944426 & 936536 & 941690 to avoid the future occurrence of the issue.

 

 

 

After the above changes the CPU on the sql server was resumed to normal.

Hope this help!!  🙂

 

Reference

=========

 

http://msdn.microsoft.com/en-us/library/cc296892(BTS.10).aspx

http://msdn.microsoft.com/en-us/library/cc296811(v=BTS.10).aspx

 

 

Parikshit Savjani
Premier Field Engineer, Microsoft

 

Similar Posts

4 Comments

  1. Hi,

    You can TERMINATOR tool from microsoft to clear the orphaned queue.Also you should have the KB’s to be installed

    Thanks,
    Madhu

  2. In SQL Server 2000, can I find equivalent query that displays CPU time like this ?

    SELECT st.text, r.cpu_time FROM sys.dm_exec_requests r
    CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS st
    ORDER BY cpu_time DESC

  3. In SQL 2000 we have sysprocesses which has cpu column. So in sql 2000 we first need to use the following query

    select * from sysprocesses order by cpu desc

    get the spid and following use the following to get the query

    dbcc inputbuffer()

    However the limitation of inputbuffer is that you may not get the complete query as it allows only 255 bytes

  4. Hello all,
    we are in a very similar situation: 17 million records in BizTalkServerIsolatedHostQ, 0 records in BizTalkServerIsolatedHostQ_Suspended, SQL Server failing with 100% CPU usage caused by [BTS_DEQUEUEMESSAGES_BIZTALKSERVERISOLATEDHOST] stored procedure.
    SP1 already installed, including KB 944426.
    All the applications using SOAP ports moved to a new separated IsolatedHost.
    We ran Terminator tool but without any effects: records were not deleted and are still growing.
    How did you terminate the orphan cache and RFR?
    Thank you very much in advance.

Leave a Reply

Your email address will not be published. Required fields are marked *