Quantcast
Channel: CSS SQL Server Engineers
Viewing all 339 articles
Browse latest View live

SQL Server: Clarifying The NUMA Configuration Information

$
0
0

The increased number of cores per socket is driving NUMA designs and in SQL Server support we are seeing more NUMA machines and less pure SMP machines.    For whatever reason over the past 2 weeks I have fielded a bunch of questions around NUMA and the information is good for everyone to know so I will try to share it here.

There are various levels of NUMA configurations that can make the this entire discussion complicated.  I will try to point some of these things out here as well.

How is the Operating System Presenting NUMA Configuration To SQL Server?

Start with the Windows Task Manager | Process Tab.

Select a process, Right Mouse | Set Affinity -- the following dialog is presented showing you the Processor Groups (K-Group), Nodes and CPUs on the machine. 

TaskManProcessAffinity

This is the layout presented to SQL Server.

Windows Resource Monitor | CPU Tab shows NUMA information as well.

ResMonNuma

Reference: http://blogs.technet.com/b/yongrhee/archive/2011/01/04/how-to-pull-the-information-that-resource-monitor-resmon-exe-provides.aspx 

MSINFO32

Information presented in MSINFO does not contain NUMA associations.   Here is an example from my 2 socket system with only a single memory node.

image

Here is an example of my 2 socket system, single memory node but configured using bcdedit /groupsize to create 2 logical groups on my system for testing.   The MSINFO32 output looks the same and you can't tell NUMA information from it so don't rely on it for NUMA configuration information.

image

Issue: The problem with all the previous Windows utilities is that it might not be showing the physical layout presented by the hardware. You may have to go to the BIOS or use the CPUID instruction(s) to determine the physical layout.  

Windows does allow configuration of /groupsize for logical testing (http://msdn.microsoft.com/en-us/library/ff564483(v=VS.85).aspx using BCDEDIT or manual establishment in the registry http://support.microsoft.com/kb/2506384However, it is rare to see these in use on a production system.

API Reference(s)

  • GetNumaHighestNodeNumber
  • GetNumaNodeProcessorMask
  • GetNumaAvailableMemoryNode

SQL Server's View

When SQL Server starts up is outputs the NUMA information in the error log detailing its view of NUMA and provides DMV outputs to show the information as well.

Here is the output from my 2 Processor Group system.  A NUMA node can't span a processor group so I am assured to have 2 NUMA nodes with CPUs associated with each node.

2011-11-09 12:38:01.38 Server      Node configuration: node 0: CPU mask: 0x000000000000000f:1 Active CPU mask: 0x000000000000000f:1.
2011-11-09 12:38:01.38 Server      Node configuration: node 1: CPU mask: 0x000000000000000f:0 Active CPU mask: 0x000000000000000f:0.

The mask for Node 0 shows 0xf which is 4 CPUs associated with the node.   This is correct as it is a 2 socket, 2 core, HT aware system (8 total) 4 assigned to each node.   The :1 is the processor group the node is associated with.

Notice that Node 0 is assigned to group 1 on my system.   This is a bit different if you look at the low level details.   SQL Server swaps its tracking of node 0 to play better with the overall system.  Since the system usually allocates things on NUMA node 0 during boot, SQL Server tries to move its node 0 to a subsequent node to avoid contention points.   Thus, in my case I see the groups appear to be swapped but that was really just a node swap and instead of SQL Server node 0 being associated with group 0 it is associated with group 1.

SQL Server distinctly tracks memory nodes and scheduler nodes and these don't have to be the same.  (See SOFT NUMA for more details).

The memory nodes are aligned, and stay aligned, with the memory locality per CPU presented by the operating system.   You can see the memory nodes presented to SQL Server using the following DMV query.

select * from sys.dm_os_memory_nodes

You case the scheduler alignments to the scheduling nodes using the following DMV query.

select * from sys.dm_os_schedulers

I point this out because when you use SQL Server SOFT NUMA configuration it does not change the memory node tracking of SQL Server only the scheduler node behaviors.

I have a non-NUMA system and SQL Server Still Shows NUMA - why?

Answer: Expected - One of the most common questions I get and here is why.

SQL Server (SQL OS) is designed to work with groups, nodes and CPUs.   To create an ONLINE scheduler for SQL Server it has to be associated with a scheduling node structure so even on a system that is not-NUMA SQL Server will track the schedulers under a single node.  It is just design but it can fool you a bit and some of the wording choices add to the confusion as well.

Here is an example from SQL Server Management Studio (SSMS) from my single socket, dual core system.    Notice that 'NumaNode0' is shown.   It is just a poor choice of wording and is exposing the same node 0 that SQL Server is using when you look at the DMVs for tracking the CPUs on the system within the scheduling node.   Everything is technically correct that we have a single (1) memory bank associated with all the CPUs on the system.

image

Here is the same SSMS view of my 2 socket, 4 core, 2 memory node system.

image

Helpful Terms

Group

Processor Group

Windows 2008 R2 added the ability to address more than 64 CPUs, called processor groups. (K-Group = Kernel Group)

· NUMA node can’t span a K-Group

· K-Group is limited to 64 max CPUs

 

NUMA Node

Hardware concept that associates CPUs with a specific set of memory resources.

CPU

A unit that can process instructions. SQL sees everything as a logical CPU (core, hyper-thread, …) for most operations.

Node Swap

SQL assumes hardware NODE=0 is heavier used by the system so it will often swap NODE=0 with NODE=1 allowing SQL to initialize on the NODE=1.

-T8025 can be used to avoid the node swapping behavior 

 

Soft NUMA

A SQL specific configuration allowing nodes to be divided. This can be used to target lazywriter, connectivity and some very specific configurations. Soft NUMA is ONLY FOR Scheduling and Connectivity – Memory locality is not impacted.

Connection Affinity

SQL allows given ports to be bound to a specific NODE or NODE(s) depending on application need.

 

Round-Robin

New connections are assigned to nodes that the PORT is bound to by round-robin assignment and then weight of scheduler within each node.

ONLINE and OFFLINE Schedulers

A scheduler is ONLINE when its affinity mask is enabled. Let’s say you have a system with 4 CPUs and the affinity mask is 1. You would have 3 offline schedulers (2,3, and 4) and 1 online scheduler (1).

SQL Server can create and park the offline schedulers so you can dynamically configure the affinity mask and the schedulers are brought online.

If a scheduler is online and you change the affinity mask to make it offline a few things happen.

1. The work currently assigned to the scheduler is allowed to complete

2. No new work is accepted on the scheduler

3. The affinity mask of the threads on the scheduler being taken offline is changed to the other viable schedulers for the instance. This is important because the work is not continuing on the original CPU the affinity was set for but it is sharing the other ONLINE schedulers. So if you have a large process on the original scheduler it can impact time on the shared CPUs until it is complete.

Memory Divided Equally

Memory Per Node = Max Memory Setting / ONLINE NUMA Nodes

A NODE is considered ONLINE as long as one of the schedulers on the node is ONLINE. All memory for the max memory scheduler is divided amount the ONLINE nodes.

What you don’t want is a situation where you OFFLINE schedulers or configure them strangely. For example Node 1 – 4 Schedulers, Node 2 – 2 schedulers. You would have 4 CPUs using ½ the memory and 2 CPUs using the other ½ the memory.

Trace Flag 8002

The trace flag is used to treat the affinity mask as a group setting. Usually the affinity mask sets the threads on the scheduler to ONLY use one CPU for the matching affinity bit. The trace flag tells SQL OS to treat the mask as a group (process affinity like). Group the bits for the same node toghether and allow any scheduler ONLINE for that node to use any of the CPUs that match the bits.

Let’s say you had the following affinity for NODE 0 on the system.

0011 - Use CPU 1 and CPU 2

Without trace flag you would get a scheduler for CPU 1 and a scheduler for CPU 2. The workers on scheduler 1 could only use CPU 1 and the workers on scheduler 2 could only use CPU 2.

With the trace flag you get the same scheduler layout but the thread on scheduler 1 and scheduler 2 would set their affinity mask to 11 so they could run on either CPU 1 or CPU 2. This allows you to configure an instance of SQL to use a specific set of CPUs but not lock each scheduler into their respective CPUs, allowing Windows to move the threads on a per CPU resource use need.

Low End NUMA

On some lower end hardware we used to get reported that each CPU has its own NUMA node. This was usually incorrect and when we detected only a single CPU per NODE we would assume NO NUMA.

Trace flag 8021 disables this override

 

ALTER SERVER

Added for SQL 2008 R2 to replace the sp_configure affinity mask settings. Once we can support more than 64 CPUs the sp_configure values are not enough to hold the extended affinity mask.


Helpful Query:


SELECT

inf.affinity_type AS [AffinityType],

nd.node_state_desc AS [NodeStateDesc],

mnd.memory_node_id AS [ID],

nd.processor_group AS [GroupID],

nd.cpu_affinity_mask AS [CpuIds],

nd.online_scheduler_mask AS [CpuAffinityMask]

FROM

sys.dm_os_memory_nodes AS mnd

INNER JOIN sys.dm_os_sys_info AS inf ON 1=1

INNER JOIN (Select SUM (cpu_affinity_mask) as cpu_affinity_mask,

SUM(online_scheduler_mask) as online_scheduler_mask,

processor_group,

node_state_desc,

memory_node_id

from sys.dm_os_nodes

group by memory_node_id, node_state_desc, processor_group) AS nd ON nd.memory_node_id = mnd.memory_node_id

ORDER BY ID ASC

References

 


Distributed Replay for SQL Server 2012

$
0
0

I recently had a lengthy exchange on DReplay with Jonathan Kehayias (SQL MVP).   From this exchange I filed several work items with the SQL Server development team to help install and setup DReplay easier in the future.

Jonathan has started a series of blog posts on his experiences that I would only be copying to place on this blog.   I suggest you read his series to assist you with your DReplay activities as well.

http://sqlskills.com/blogs/jonathan/post/Installing-and-Configuring-SQL-Server-2012-Distributed-Replay.aspx 

Bob Dorr - Principal SQL Server Escalation Engineer

Exporting via HTML instead of MHTML

$
0
0

There was a question on Twitter about how to display a report in HTML instead of MHTML due to some browser issues.  Based on the fact that it was MHTML, I’m assuming we are talking about exporting a report as the default report view is HTML.  First off, if we look at our export options for a report, we see the following:

image

HTML isn’t an option.  One of the reasons or this is because if you export with HTML, and have items in your report such as images, they wouldn’t be included with the export.  With MHTML, the binary of the image can be included and will be displayed properly.  So, this is the default export option for HTML to make sure we get everything, and the report looks consistent with what is displayed for an on demand report.

That being said, you can change it.  This is done within the rsreportserver.config file.  Within this file we include the different renderers that Reporting Services will use.

<Render>
    <Extension Name="XML" Type="Microsoft.ReportingServices.Rendering.DataRenderer.XmlDataReport,Microsoft.ReportingServices.DataRendering"/>
    <Extension Name="NULL" Type="Microsoft.ReportingServices.Rendering.NullRenderer.NullReport,Microsoft.ReportingServices.NullRendering" Visible="false"/>
    <Extension Name="CSV" Type="Microsoft.ReportingServices.Rendering.DataRenderer.CsvReport,Microsoft.ReportingServices.DataRendering"/>
    <Extension Name="ATOM" Type="Microsoft.ReportingServices.Rendering.DataRenderer.AtomDataReport,Microsoft.ReportingServices.DataRendering" Visible="false"/>
    <Extension Name="PDF" Type="Microsoft.ReportingServices.Rendering.ImageRenderer.PDFRenderer,Microsoft.ReportingServices.ImageRendering"/>
    <Extension Name="RGDI" Type="Microsoft.ReportingServices.Rendering.ImageRenderer.RGDIRenderer,Microsoft.ReportingServices.ImageRendering" Visible="false"/>
    <Extension Name="HTML4.0" Type="Microsoft.ReportingServices.Rendering.HtmlRenderer.Html40RenderingExtension,Microsoft.ReportingServices.HtmlRendering" Visible="false"/>
    <Extension Name="MHTML" Type="Microsoft.ReportingServices.Rendering.HtmlRenderer.MHtmlRenderingExtension,Microsoft.ReportingServices.HtmlRendering"/>
    <Extension Name="EXCEL" Type="Microsoft.ReportingServices.Rendering.ExcelRenderer.ExcelRenderer,Microsoft.ReportingServices.ExcelRendering"/>
    <Extension Name="RPL" Type="Microsoft.ReportingServices.Rendering.RPLRendering.RPLRenderer,Microsoft.ReportingServices.RPLRendering" Visible="false" LogAllExecutionRequests="false"/>
    <Extension Name="IMAGE" Type="Microsoft.ReportingServices.Rendering.ImageRenderer.ImageRenderer,Microsoft.ReportingServices.ImageRendering"/>
    <Extension Name="WORD" Type="Microsoft.ReportingServices.Rendering.WordRenderer.WordDocumentRenderer,Microsoft.ReportingServices.WordRendering"/>
</Render>

One of the properties that you can see is the Visible property.  Some of these are set to false by default.  This actually lets Reporting Services know if this should be a visible export option.  One of the items is for the HTML4.0 renderer.

<Extension Name="HTML4.0" Type="Microsoft.ReportingServices.Rendering.HtmlRenderer.Html40RenderingExtension,Microsoft.ReportingServices.HtmlRendering" Visible="false"/>

By setting visible to true for this item, it then appears as an export option.

image

This should work good for a subscription operation, but if you just hit export from the browser, it will just pop it up in a new tab as HTML is an accepted format.  So, you won’t get the prompt to download. From a browser perspective, you can also get a clean HTML version of the page via URL access:

http://localhost/ReportServer?%2fHelloWorld&rs:Command=Render&rs:Format=HTML4.0&rc:toolbar=false

In this example, you don’t even need the Format parameter as HTML4.0 is the default renderer.  You can find more information about the available URL access parameters at this page:

Using URL Access Parameters
http://technet.microsoft.com/en-us/library/ms152835.aspx

 

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

A faster CHECKDB – Part I

$
0
0

Earlier this year I travelled to Japan having the opportunity to visit some of our customers and prospective customers. One feedback I received loud and clear was poor performance when using DBCC CHECKDB as customers have moved into the TB range of databases on a regular basis. I had certainly heard this feedback before from other CSS engineers but not the extent of complaints I heard while in Japan. I had never really investigated these claims of performance issues because I suppose I fell under the same feelings I had and seen for years of “CHECKDB takes as long as it takes”.

On that trip I met Cameron Gardiner from the SQL Customer Advisory Team (SQL CAT). Cameron is an expert with SAP systems and focuses his efforts at Microsoft on SAP running on SQL Server. Cameron spent tine to explain what a problem the performance of CHECKDB had become for his largest SAP accounts using SQL Server.

So I went back to the US determined to investigate the cause and possible solutions for this. The result of this work is now available in the latest cumulative update for SQL Server 2008 R2 customers. You can read the basics of how to apply these two trace flags with this update in this article. A SQL Server 2008 version of these changes will also be available early next year.

I’m on vacation for the rest of the year, but when I come back, I’ll create another blog post (Part II) with the details behind these changes, how we did this, and how this enhancement might help you when using DBCC CHECKDB.

Bob Ward
Microsoft

 

 

 

 

Assigning SQL Server, SQL Agent to a Processor Group (OOM, Hang, Performance Counters Always Zero for Buffer Pool, …)

$
0
0

Suresh brought to my attention that we have been getting questions as to why SQL Server starts on group 1 and then group 2 and it is not predictable?  Then Tejas brought up another issues and since I worked on this way back before we released SQL 2008 R2 I went back to my notes to pull up some details that I thought might be helpful.

The answer is that you need a SQL Server that is Group Aware to use more than one group worth of CPUs in Windows.  However, if you have an older version or a SKU that does not support enough CPUs to span groups the default is for Windows to start the service on any group.  

SQL Server 2008 R2 uses the group aware so it will use new APIs and establish proper use of the entire system. 

clip_image001

Legacy XPROC / COM Object / Linked Server using a CPU based scheme.  

Some designs of legacy components may not be group aware or safe.  In the following example, if originally loaded on Group 2 the initialization would see 60 CPUs and create 60 partitions for a local memory manager which might work perfectly with proper synchronization.  However, if the original initialization of the partitions occurs on Group 1 it will only create 40 partitions and access from CPUs 41 ... 60 on Group 2 may fail as they don't exist.   (SQL Server does not have any of these components.)


Great Legacy Reference: http://msdn.microsoft.com/en-us/windows/hardware/gg463349.aspx 

Windows allows you to configure node to group assignments

By configuring Windows to see the same number of CPUs and Nodes within each processor group the SQL Server, SQL Agent, … services can start on any processor group and they will see the same amount of resources.


Note: There is a hotfix for Windows 2008 R2 that automatically balances the processor assignments are corrects the issue for most systems: http://support.microsoft.com/kb/2510206

 

The following knowledge base article outlines how to assign nodes to the processor groups so you have control over the group assignments. How to manually configure K-Group assignment on multiprocessor machines: http://support.microsoft.com/kb/2506384

You can use Windows Task Manager to see and set the affinity for the process

Start with the Windows Task Manager | Process Tab.

Select a process, Right Mouse | Set Affinity -- the following dialog is presented showing you the Processor Groups (K-Group), Nodes and CPUs on the machine.

TaskManProcessAffinity

 

The Problem

 

Even after all this work there is an issue using older versions of SQL Server (SQL 2008 pre-SP3 and SQL 2008 R2 pre-SP2) which requires SQL Server to startup in the group that holds physical NUMA Node = 0 and at least 1 CPU assigned from physical node 0 to the instance, or lazywriter activity is not started property.   This can lead to buffer pool sizing, performance counters not getting updated and other stall and hang like behaviors (OOM, Latch Timeouts, …) of the SQL Server process.

 

As a general rule is best to start them in group 0 (usually contains physical NUMA node = 0 but you must confirm this.)

 

You can enforce this on the system by setting the ImageFileExecutionOptions - NodeOptions value and by making sure the SQL Server affinity mask contains at least 1 CPU in physical NUMA node 0.   However, this forces all instances of SQL Server to start on the same node and all instances have to share physical NUMA node = 0.

 

Excerpt from the Windows 2008 R2 Release Notes

"Allocation of child processes among the ideal nodes of Non-Uniform Memory Access (NUMA) nodes is not efficient, which results in performance degradation, increased latency, and cache misses, depending on the affected processes.

To correct this, edit the registry to change the inheritance of ideal NUMA nodes so that generated child processes are assigned the same IdealNode as their parent process. This setting is not system wide, but per-process.

To change the NUMA inheritance
  1. Open Registry Editor (Regedit.exe).

  2. Add the following registry key:

    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\<process_name>.exe

    where process_name is the name of the process you want to change the inheritance behavior for.

    Assign a DWORD value to this key with the name NodeOptions and data 0x00001000.

    Close Registry Editor and restart the computer."

Windows does not currently provide a per service, instance option for starting the service in a specific group.   To avoid starting multiple instances of SQL Server on the same processor group use SQL Server 2008 R2 or newer version and establish the proper processor affinity.

 

SQL Server 2005
SQL Server 2008
Requires SQL Server Affinity Mask to use a CPU in physical NUMA Node = 0 as presented by the operating system. Use NodeOptions to start SQL Server instance in proper group that presents CPUs from physical NUMA Node = 0

- or -

Apply Windows SMC.exe QFE to assign startup group to a specific group containing physical NUMA node = 0
SQL Server 2008 SP3 No longer requires use of CPU in physical NUMA Node = 0 Requires Windows SMC.exe QFE to assign startup to a specific group or you can allow random group startup behavior.
SQL Server 2008 R2 No longer requires CPU in physical NUMA Node = 0 Is group aware so it does not need the NodeOptions or SCM.exe QFE.

 


Other Helpful References

Bob Dorr - Principal SQL Server Escalation Engineer

How It Works: sys.dm_io_pending_io_requests

$
0
0

I have been working an issue where the DMV was returning io_pending_ms_ticks incorrectly.  The following output is an example of ~164 hour delay. Unlikely to occur without any other side effects noted.

select * from sys.dm_io_pending_io_requestsimage












In the process of this investigation I uncovered several details that I found helpful to share.

Full Text and File Streams
There is a small set of I/Os that could show a large io_pending_ms_ticks value but io_pending should = 1.    There are administrator interfaces for both the full text and file stream features.   Think if them a bit like a new TDS connection to the server.   When full text or a file stream request arrives the request needs to be processed.   These requests are simply waiting on the arrival of a new request from the respective features.    They seldom show up on this list and the file handle will not map to any of the handles exposed in the DMF - virtual_file_stats.

io_pending is the key
The io_pending column indicates 1 if the I/O is still pending within the kernel.   A value of 0 indicates the I/O only needs to be processed by the local SOS scheduler.   In this case we are not getting any I/O delay warnings, performance monitor is not showing I/O issues and there are no SOS scheduler issues.  

Dirty Read
After some more digging the issue is a DIRTY READ.  SQL Server maintains a set of I/O request structures (request dispenser).  When the I/O completes the request structure is returned to the dispenser and can be reused for another I/O.    The DMV needs to materialize the list without causing scheduler issues on the system so it is designed to perform (NO LOCK) dirty reads of the I/O list. This is where your system may return you incorrect io_pending_ms_ticks value when the io_pending flag = 0.

Take the diagram below as an example.  It is possible that the DMV is being materialized on one scheduler but the I/O is being completed on the I/Os owning scheduler.  (Each scheduler has its own I/O list).

image

The DMV query can get access to the I/O request structure but it does not hold any locks on the structure.  If it did it could lead to unwanted blocking.   Take for instance that the application fetching the DMV output stalls.  If a lock is held on the I/O list the lock can't be released until the entire list is properly traversed.  Because the client is not fetching results it could lead to the I/O being stalled on the scheduler for a long period of time.

To avoid this the list is run in a dirty fashion.   However, this means the IO request could finish I/O completion in parallel with the reading of the structure data.   The SQL Server protects the DMV query but does not indicate to the DMV user that no lock data movement has occurred.    Instead it is possible that the output can become skewed, as shown in previous output.   The IO data structure can be re-assigned between the time the DMV query starts to read the information and the time all columns are produced, leading to unexpected output.

SQL Server 2012 (Denali) updates this behavior by adding a signature to the I/O request.  This allows SQL to maintain the dirty read capability while also identifing I/O requests that fall into this category.

When you see large pending_io_ms_ticks consider the io_pending flag and additional scheduler warning information (178** messages) in the error log.

Bob Dorr - Principal SQL Server Escalation Engineer

How It Works: sys.dm_tran_session_transactions

$
0
0

For some reason I have been looking at DMV output closely the last couple of weeks.    I just blogged about the pending I/O requests and now I have a behavior to outline for dm_tran_session_transactions for todays blog.

select * from sys.dm_tran_session_transactions

The scenario I was looking at is as follows.

Server A   Server B
Broker calls activation procedure    
Begin Tran (local)    
Linked Server Query Transaction Promoted to DTC Transaction Imported from Server A
xp_cmdshell Looped back and blocked
Separate Transaction
Blocking
  • The process is based on SQL Service Broker. 
  • The service broker session on Server A is 20s
  • It starts a local transaction and eventually performs a linked server query.  
  • This causes the transaction to be promoted to a DTC transaction.  

At this point in time the *active transaction* DMVs on both Server A and B show the enlisted UOW of the DTC transaction.  That is, except the sys.dm_tran_session_transactions on server A.

After some digging I uncovered that the dm_tran_session_transactions DMV only outputs rows for sessions (s) that are NOT system level sessions.   Since the broker activity is handled on a system session the DMV will not materialize a row for Session 20s on Server A in this example.    Instead you have to use the additional  *active transaction* tables to track the UOW across this system.

Note: I am able to use any transaction (local or DTC) as part of SQL Service Broker activation which will not show rows in the session transactions DMV because it is considered a system session.

Bob Dorr - Principal SQL Server Escalation Engineer

The official release of System Center Advisor…

$
0
0

If you have followed this blog, you have probably seen a series of posts documenting the life of a project I’ve been working on, Atlanta, to a product called System Center Advisor. Today marks the official release of that product. This has been a particularly rewarding journey for me to see an idea about giving our knowledge in CSS to customers turn into a full proactive assessment service powered by System Center and the cloud.

In my past blog posts, I’ve shown you examples of the knowledge this product brings in the form of alerts and configuration history. With the official release comes new features such as a new dashboard about the assessment of your servers, a revised and simplified setup program, voting buttons to tell us if the alerts are useful, and the ability to provision other users to view or administer your console.

image

While I love to see new features like these that make this product compelling, the power of this product is in the knowledge. So while System Center Advisor has been ramping up towards this release while being a release candidate, our CSS teams have been adding knowledge in the form of alerts each month based on actual customer experiences.

Here is an example. Several months ago, one of our engineers in CSS pointed out to me that we have seen a trend where our customers would contact us and report the following error in Management Studio when trying to expand the Databases Node:

image

After some investigation, we found out the customer had disabled the ‘guest’ user in the msdb database. As it turns out, Management Studio requires the login connecting to have “CONNECT” access to msdb. If the login is not mapped directly to a user in msdb (usually not the case), it needs to use guest. If guest is disabled, then any access to msdb (Ex. use msdb) would result in this error.

We looked into this further and found out this was not a one time occurrence. Several customers had reported this problem to us and from other sources (Microsoft Connect, …). Why would so many people hit this? We asked one customer and they said they were simply following guidelines established in our documentation as outlined at:

http://msdn.microsoft.com/en-us/library/ff848752.aspx

In this part of the docs, the tip for the guest account says:

image

While the server itself doesn’t allow you to disable guest in master or tempdb (You get the error Msg 15182 Cannot disable access to the guest user in master or tempdb. if you try), for msdb we unfortunately allow this. While the server will run in this scenario, the use of tools like Management Studio will have problems. So customers were just doing what we told them.

While changing the product to prevent this is the right thing to do, what do we do in the meantime to warn customers this could be a problem? We can certainly publish articles and documentation for them to discover but why not automate the check? When I heard about this issue, I thought What a perfect rule for System Center Advisor

So as part of our monthly update to rules for SQL Server we introduced this check on your system. Should you disable guest for msdb you will get an alert which will direct you to the following technical article written by CSS talking about the problem and how to solve it:

http://support.microsoft.com/kb/2539091

This is the power of what Advisor can bring to you. We continue to update our rules each month as we discover trends about common customer problems or customer problems that may be difficult to detect or find by just searching the web.

This journey is far from over. While we to continue to create new rules for SQL Server, both the System Center Advisor team and CSS will continue to enhance this product including:

  • Rules for other server products (there are already several rules for the core Windows OS, Active Directory, and Hyper-V)
  • Provide support for SQL Server 2012 (and new rules unique to that version)
  • Expand features and capabilities of the System Center Advisor software and portal

System Center Advisor is available in 26 countries and comes at no charge to customers that have a Software Assurance agreement. Don’t have Software Assurance but want to try it out for free? Install it from the System Center Advisor site and choose the 60 day free trial option. Don’t know if your company has Software Assurance? Talk to the team in your company that handles volume licensing for Microsoft purchases (they typically have access to the Volume Licensing Service Center site).

If you try System Center Advisor and want to provide feedback, there is a Feedback link at the top of the Advisor console. However, I’m always personally interested to hear directly from customer experiences good and bad on how this product works for you. So feel free to comment on this post on your experiences.

Bob Ward
Microsoft

 


The case of the incorrect page numbers

$
0
0

As you may or may not know, SSRS 2008 R2 added the ability to automatically create page breaks on group changes.  Historically, people attempted to use custom code to accomplish this and, while it worked, the pagination logic we use in SSRS 2008 R2 breaks the standard implementation of this for several reasons.  You can see the typical implementation of this at http://blogs.msdn.com/b/chrishays/archive/2006/01/05/resetpagenumberongroup.aspx.

Unfortunately, in SSRS 2008 R2 the pagination engine does multiple passes in some scenarios thus breaking this logic.  Also, the logic referenced above makes the assumption that the report is rendered from page 1 to page N and this assumption is not necessarily true either.  We recognized that losing that ability to do the pagination at a group level is a key feature so we actually rolled it completely into the product.  Robert Bruckner (one of the SSRS developers) gives a really good explanation of all of the ins and outs of the feature at http://blogs.msdn.com/b/robertbruckner/archive/2010/04/25/report-design-reset-page-number-on-group.aspx.

I wish I could say that this feature has been completely without pain, but that is not true.  If any of you have been working with SSRS for any length of time, you know that there are all kinds of quirks to the pagination process.  Given the correct information, it is always fairly easy to figure out why the pagination engine resulted in a certain pagination, but understanding all the pertinent details is….a bit difficult.  In a lot of ways, I view pagination sort of like the query optimizer in the SQL Server engine.  It seems like total black magic if you don’t know the rules it follows, but once you understand the rules it generally makes good sense.  However, again like the query optimizer, the pagination logic is very complex and is something that we are constantly refining.  As you can imagine, people regularly find edge cases where the pagination either doesn’t function as they want or flat out makes a mistake.  In addition, layering group level page breaks on top of normal pagination makes the scenario even more complicated.  Thus, I have to make the admission that we have had several fixes released in this specific area.  Sigh…

The good news is that I have also generally found that a good chunk of my page numbering problems go away by using the latest CU.  The bad news is that a simple fix like upgrading to the most recent CU didn’t fix my most recent problem.

So, here’s the situation.  My customer wanted to reset the page number whenever the group changed.  No big deal in theory.  As documented both in the MSDN documentation and Robert’s blog, all they had to do was to add a couple of properties to the target group:

image

No big deal, right?  They completely expected to see the pagination look something like this:

image and image, etc.

This is what my customer saw instead:

image

What the heck????!?!!?!?!

Needless to say, this is when they decided to open a case with Microsoft Support.  The problem got even weirder when I discovered that under no circumstances could I duplicate the behavior!!!  I could reproduce a whole host of problems (2 of 1, 9 of 6), but by applying some workarounds and the latest CU’s I could always get back to normal page numbering resets.  At this point, I was completely stumped.  I even engaged the SSRS Product Group to ask if there was some sort of race condition that could lead to this behavior.  Unfortunately, the answer I got back was that there was no identifiable race condition scenario where 1 of 337 could happen.

Then, as always seems to be the case, I put the problem aside for a couple of days to let it stew.  And then, while driving into work the next Sunday, it hit me.  Target server version!

You see, it had occurred to me that one of the good things about BIDS 2008 R2 is that it allows you to publish against both a 2008 and a 2008 R2 instance of SSRS.  *However*, the thing I had never checked before was how we dealt with a 2008 R2 feature when deploying against 2008.  It turns out we strip the feature out during the publish process.  When I change my project properties to target a 2008 instance even though my instance is actually 2008 R2, this is what I saw:

Building ‘FailingScenario.rdl' for SQL Server 2008 Reporting Services.

A PageBreak property was removed from the data region, rectangle, or group ‘GroupOnWhichToIterate’. SQL Server 2008 Reporting Services does not support the PageBreak property ResetPageNumber.

The PageName property specified on the data region, rectangle, or group ‘GroupOnWhichToIterate’was removed from the report. SQL Server 2008 Reporting Services does not support the PageName property.

Build complete -- 0 errors, 2 warnings

This led me to ask my customer what version of SSRS their instance of BIDS was targeting as guess what?  It was SSRS 2008!

clip_image002

The good news is that getting the desired group level page breaks was as easy as setting the target version to Reporting Services 2008 R2 and deploying the report.

Management Studio tricks you may or may not know about

$
0
0

I was sitting through SQL Server 2012 training, and Ajay Jagannathan was showing us Management Studio.  Eric Burgess had worked on this topic.  He started covering some neat things, that have apparently been there for a while, but I just never played around with it enough to see what all you could do.  I’m sure a lot of others are not aware of these either, so I thought I would highlight some of them.  These may help to make you more productive, or not - depending on how long you spend playing with these. The biggest change for Management Studio within SQL Server 2012 is that we are using the Visual Studio 2010 Shell.

These items aren’t exactly new, but they were new to me. 

1. Keyboard shortcuts

There are a ton of Keyboard shortcuts that you can use within Visual Studio.  The default settings are based on Visual Studio 2010.  Here is a list of those shortcuts. SQL Server Management Studio Keyboard Shortcuts

You can get to these through Tools -> Options -> Environment/Keyboard

image

 

2. Block Selection

Sometimes you may only want to select and copy a column of text as opposed to the normal text selection done by holding down the Shift Key.  To do Block selection, you can do SHIFT+ALT and drag your mouse to only select certain areas of your text in column fashion.

image

 

3. Status Bar

Most people should be familiar with the Status Bar at the bottom of a query window.

image

Under Tools -> Options -> Text Editor/Editor Tab and Status Bar, there are a bunch of options that you can play with.

image

One is the Status Bar Location.  By default, this is set to Bottom, but you can move it to the top if you choose.  This was a little weird for me though - although I’m just not use to it.  It does put it more front and center.

image

From a color perspective, you’ll notice Group Connections and Single Server Connections.  I’ll talk about Group connections below, but just be aware that you can change the coloring for when it is a group connection.

 

4. Cycle through Query Windows

I’ve known about ALT+TAB to cycle through programs (Windows).  I’ve also known about CTRL+TAB to cycle through components within a given application.  For example in Excel you could use CTRL+TAB to move between worksheets.  I’ve never tried it in Management Studio, but it allows you to cycle through the Query Windows.

 

image

 

Also, CTRL + F6 will cycle through the actual tabs without the graphic switching display.

5. Group Connections

You can go to View and select to show Registered Servers.  Within Registered Servers you can create a group of SQL Servers.  This then allows you to start a query that will be run against all of the servers within the group.  This is where the Group Connection Color for the status bar comes into play.

image

This could be really handy if you need to execute items across multiple servers.  The color of the status bar is there to help you realize that that query is a group query as opposed to a single server connection.

 

Hopefully these are useful to you.

 

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

Analysis Services Thread Pool Changes in SQL Server 2012

$
0
0

As part of the SQL Server 2012 release several changes were made to Analysis Services that should alleviate some issues previously seen on large NUMA node machines.

Separation of IO and Processing Jobs

To better understand how the changes that were made work, it is helpful to have some background information about what types of issues were seen in earlier releases.

During testing, it was found that a 2 NUMA node 24 core server was handling roughly 100 queries per second during stress testing.  The same tests were then run on a 4 NUMA node server with 32 cores and the queries answered per second actually decreased. Investigation indicated the cause as cross NUMA node access to the system file cache. For example if the first job that read a file was scheduled on a CPU assigned to NUMA node 1 and a subsequent job, which needed to read the same pages from the file, was scheduled on a CPU assigned to NUMA node 3, a significant performance penalty was seen reading the cached file page.  Deeper investigation determined that the bottleneck appeared to be in accessing the standby list of file cache pages.  The performance impact appeared to become more pronounced the higher the number of CPUs and NUMA nodes on the machine.  For machines with less than 4 nodes the impact was not found to be significant.  However, on machines with 4 or more nodes the impact was such that performance could begin to regress.

One possible work around is to open files in Random mode since this changes the file cache access behavior. To accommodate this SSAS 2012 now allows Random file access to be configured through the msmdsrv.ini file by setting the RandomFileAccessMode property to a value of 1.  This server property does not require a service restart to take effect, but a restart is recommended.  If the server is not restarted then Analysis Services will not release open files or change the way it accesses open files, but the setting will take effect for newly opened or created files.

While changing the file access mode may provide some relief, eliminating cross NUMA node file access whenever possible is a better long term solution. Towards this goal, SQL Server 2012 Analysis Services now has separate Process and IOProcess thread pools which are NUMA node aware.  The new IOProcess thread pool will handle read jobs while the Process thread pool will continue to handle ROLAP and processing related jobs, including writing new files. On machines with 4 or more NUMA nodes, the IOProcess thread pool is not a single thread pool but instead a collection of thread pools with each NUMA node having its own pool of IOProcess threads.  Assigning each NUMA node its own IOProcess pool doesn’t by itself help with cross NUMA node file access unless file IO operations are consistently assigned to the same IOProcess thead pool, so an algorithm for assigning partition file read operations to specific IO thread pools was also added. At RTM, the algorithm spreads partition reads across IO thread pools based on the ordinal position of the partition in the partitions collection for a measure group.  The algorithm is subject to change without notice in future builds so design decisions should not be made based on this behavior.

An attentive reader will notice that the discussion has highlighted partition reads to this point, what about dimension operations? Dimension read jobs are always assigned to the IOProcess thread pool for NUMA node 0.  While one could argue that this scheme could result in NUMA node 0 being assigned a larger percentage of the work, it is expected that most dimension operations will be operating on cached data in memory and won’t have a noticeable impact. 

Because IOProcess threads are expected to perform short duration read operations on files and as such do not register themselves as cancellable objects.  This means that even if a query which requested an I/O operation was cancelled, the in-process I/O jobs could continue to run for a short period of time after the query was cancelled, however new I/O jobs should not be created for the canceled query.

As with other SSAS features, the default behavior of the per NUMA node IOProcess thread pools is intended to cover the most common scenarios.  However, in some situations it may make sense to override the 4 NUMA node threshold for the NUMA node affinitized IOProcess thread pools.  To allow administrators to revert back to a single IOProcess thread pool or force per NUMA node IOProcess thread pools for machines with less than 4 NUMA nodes a new entry has been added to the Analysis Services configuration file (msmdsrv.ini).  The PerNumaNode setting under ThreadPool\IOProcess has a default value of -1 which tells the server to use the automatic 4 NUMA node threshold.  Changing this value to 0 disables the per NUMA node thread pool behavior, while setting it to 1 will turn on this behavior (even if there are less than 4 nodes).

The splitting of Process and I/O jobs into separate thread pools and the assignment of I/O jobs to consistent NUMA nodes should alleviate some of the performance impacts of cross NUMA node operations, significantly increasing the performance of SSAS workloads on higher end servers.


Greater than 64 CPU support and Thread Affinitization 

In adition to giving IO operations their own thread pools, Analysis Services now supports more than 64 CPUs, something the relational engine has had for a while. 

In order to add support for more than 64 CPUs, Windows uses the concept of processor groups.  A processor group in Windows can contain a maximum of 64 CPUs and systems with more than 64 CPUs will contain multiple processor groups.  For more details on processor groups and support for more than 64 CPUs read the following:
http://msdn.microsoft.com/en-us/library/dd405503%28VS.85%29.aspx
http://blogs.msdn.com/b/saponsqlserver/archive/2010/09/28/windows-2008-r2-groups-processors-sockets-cores-threads-numa-nodes-what-is-all-this.aspx

To support multiple processor groups and thus more than 64 CPUs, SSAS 2012 was updated to set the process affinity mask for the msmdsrv.exe process to span multiple processor groups.  Along with this capability a new configuration property named GroupAffinity was added for each thread pool in the server.  This property allows an SSAS administrator to have fine grain control over which CPUs on a machine are used for each thread pool.  The GroupAffinity setting is a bitmask that is used to determine which CPUs in a processor group can be used for the thread pool in which the GroupAffinity mask is defined.  For example if the following entry:
<GroupAffinity>0xFFFF,0xFFFF</GroupAffinity>
were to appear under <ThreadPool> <Process> in the msmdsrv.ini file, it would affinitize threads to 16 logical processors in the first two processor groups on the server. Where the following entry:
<GroupAffinity>0x00F0,0xFFFFFFFF</GroupAffinity>
would affinitize threads to the CPUs 4-7 in the first processor group, and the first 32 CPUs in the second processor group.  The GroupAffinity property can have as many comma separated hex values as there are defined CPU groups on a server.  If the mask contains less bits than the number of CPUs for the processor group then it is assumed that non-specified bits are zeros. If no GroupAffinity value is specified for a thread pool (default) then that thread pool is allowed to spread work across processor groups and CPUs on the box.

For diagnostic purposes the msmdsrv.log file now contains entries at service start that reflect the size of each of the five thread pools (Query, ParsingShort, ParsingLong, Processing, and IOProcessing) and their settings, including affinity.

Example:
(12/21/2011 4:16:04 PM) Message: The Query thread pool now has 1 minimum threads, 10 maximum threads, and a concurrency of 4.  Its thread pool affinity mask is 0x0000000000000003. (Source: \\?\C:\Program Files\Microsoft SQL Server\MSAS11.DENALIRC0\OLAP\Log\msmdsrv.log, Type: 1, Category: 289, Event ID: 0x4121000A)

Note: Although VertiPaq can use more than 64 CPUs, GroupAffinitization is not currently supported for the VertiPaq thread pool, even though an entry exists in the msmdsrv.ini file.

While the GroupAffinity setting was added as part of the work to support more than 64 CPUs, this property can also be used on servers with less than 64 CPUs to control which CPUs are used for specific operations.  Through use of the GroupAffinity mask administrators can push I/O, processing, query, or parsing threads to specific CPUs to obtain optimal resource usage, or better enable resource sharing across multiple processes on the same server.

Wayne Robertson, Sr. Escalation Engineer Analysis Services

A faster CHECKDB – Part II

$
0
0

Note: Validation for this post was performed in the SQL Server Customer Advisory Team (SQL CAT) Customer Lab on an HP Proliant DL385 G7 (overview | quickspec), AMD Opteron 6176 SE 2.3GHz dual socket (12 cores per socket) with 128GB RAM using HP StorageWorks P2000 G3 MSA Array Systems (4 shelves, 10TB raw, 8gbs fiber channel) (overview | quickspec).

In December of last year, I announced some changes made to improve the performance DBCC CHECKDB.  To be perfectly transparent, I created a very short tale of these changes because the Cumulative Update was just shipped and I was going to be out of the office for the rest of the year. After a month or so here in the new year of 2012, I wanted to provide more of the history and details about these changes.

Last year I was invited to spend a week with our SQL Product Executives in Japan visiting some of our top and prospective customers. Having a passion and charter for support, I naturally looked for opportunities where I see problems customers are having with the SQL product. I found out quickly from some of our biggest Japan customers, that the performance of CHECKDB was an issue. As I said in the previous blog post, as part of this trip I met Cameron Gardiner, who told me about how prevalent of a problem this had become for some of his largest accounts running SQL Server with SAP.

So after coming back to the US, I sought a solution. One that wouldn’t require a re-architecture of CHECKDB. Something we could do that was targeted but would provide value. One thing Cameron had pointed out to me was that it did not appear as though CHECKDB was driving the disk hard enough. He used an old trick for this find. Backing up a database to a NUL disk. How do you do this? Try out this syntax:

BACKUP <db> to DISK = ‘NUL’

Some people know about this as I’ve seen postings on the web about customers who have used this to test disk read performance.  WARNING: Do not run this command on a production server without using the WITH COPY_ONLY option. Otherwise, it can affect your differential backups.

So what does this do? The BACKUP code in the engine is designed to do 2 things: 1) Read all the allocated pages for this database from the database files 2) Write them to a target media. A filename of ‘NUL’ is an old DOS trick (for those of you reading this that don’t know what DOS is, I’m sorry my age must be showing). The Windows OS effectively recognizes this as a special name that means “throw it away”. So by using this as our backup write target, we can test the performance of reading from our database files on disk since no time is spent writing. Backup reads the files differently (and buffers them differently) than the code in the engine to read database pages for queries (i.e. Read Ahead). So it is not quite a 100% fair comparison as we will see to what CHECKDB could do. But it is close enough approximation to see what is the best read throughput we could possibly get from our database files.  What Cameron would see in his testing is that he could achieve on a common system about 50Mb/sec (using the perfmon counter PhysicalDisk: Disk Read Bytes/sec) on each drive where his SAP SQL database was stored (in his test he had 8 drives with two files on each drive). But when he would run DBCC CHECKDB WITH PHYSICAL_ONLY, he could only get at best about 20Mb/sec on each drive. Another observation he made was when he monitored PhysicalDisk: Avg Disk Bytes/Read it would show several instances of 8kb reads. He asked me why we would be doing so many single pages reads? So from his perspective it appeared as though CHECKDB wasn’t reading the files as fast as BACKUP or as efficiently. When he first told me this, I explained that CHECKDB had different purposes and different work to do than BACKUP so it is not a fair comparison. But he told me that at no point during the CHECKDB run did we ever get close to the 50Mb/sec rate for each drive. He would have expected that even with some of the “overheard” of CHECKDB to perform consistency checks that at some point it would get close to BACKUP. It was hard to argue this point. So he and I went on a path of doing more tests on SAP SQL test databases he had available in Microsoft labs. I realized after some testing that there was an opportunity for improvement. But I knew I needed help. So along come some heroes in this story.

First, I need to secure lab resources for very large databases and compute resources. So along comes my first hero, Jimmy May. Jimmy is a Senior Program Manager in the Customer Programs team (i.e. SQLCAT). He is my resource when it comes to large lab resources such as this. But it is not just Jimmy. Throughout this entire process, Steven Schneider worked tirelessly to keep providing me the hardware resources I needed to make this happen. Without Jimmy and Steven, I’m not sure I would have ever made it through this.

With Cameron’s help to get an actual SAP-SQL database and the right hardware resources, I was ready to investigate and tack the problem. For so many years, Paul Randal and then Ryan Stonecipher were my “goto” SQL Developers to investigate CHECKDB issues. When I contacted Ryan about it he told me another developer owned CHECKDB now. So my next hero (and the real hero who made this happen) came into the story, Andrew Cherry.  Andrew is a developer working within the SQL Engine. He and I started a dialogue on the problem and what could be done about it.

We went through several different passes at what could be a possible bottleneck to allow CHECKDB to drive the disk harder.  The first thing Andrew told me about our work is that 1) We could really only optimize this work for the WITH PHYSICAL_ONLY option 2) CHECKDB was never designed to push the hardware resources to their limit. In fact, CHECKDB was designed to ensure it didn’t consume too many resources so it would not affect the performance of the overall system. One case in fact is the behavior of “batching”. Instead of trying to process all objects in the database at one time, CHECKDB processes them in “batches” to avoid using too much tempdb space. Andrew noticed that due to the sheer number of IAM pages in the SAP SQL database we were using that the batch concept was slowing down performance. This is due to the fact that all the IAM pages were in cache from the “CHECKALLOC PHASE” at the beginning. But as we started processing each “batch”, some IAM pages where getting thrown out of cache. So later batches would have to perform very slow single 8k page reads to bring them back in cache. This explained the behavior Cameron saw for 8kb reads. So we felt one change we could make was to give the customer an option to use only “one batch” (and in fact that is one of the changes in the eventual trace flag 2562).

We stopped at this point and made sure we agreed on our goals and boundaries:

  • We cannot make such large changes as to risk the correctness and architecture of CHECKDB
  • We would only target WITH PHYSICAL_ONLY even though we believed it could help the “FULL” CHECKDB scenario.
  • Our changes had to be acceptable to fit into a Cumulative Update (i.e. no “big” changes)
  • We would try to push hardware resources harder
  • Therefore, we would use trace flags as options for these changes as we didn’t want the default to be “push the hardware resources harder”
  • We would try to get as close as possible to the “BACKUP ‘NUL’” test for the portion of CHECKDB that reads in all the pages from disk. We know we cannot achieve that same performance but let’s strive to get as close as possible.

The results of the “object batching” work certainly helped but we both felt it was not enough. We did see now the Avg Disk Bytes/Read stay steady around 64kb which told us that standard Read-Ahead algorithms were kicking in and the single page read problem was gone. But it was apparent that we still could not come close to the disk read byes/sec number that  BACKUP was achieving. Andrew went off to investigate what could be causing this and whether we could do something about it.

After some period of time, we had a meeting and he came up with his findings. He observed that during the time CHECKDB reads pages, there was what he thought an unusual high number of non-Page Latch Waits surfaced as what is called DBCC_MULTIOBJECT_SCANNER latches. This latch is used for several purposes but one of them is to protect an internal bitmap of all pages that need to be read. When CHECKDB “runs in parallel” we need to make sure we coordinate which pages to read across all the threads. We noticed during the time we read pages, these non-Page latches waits were in the tens of thousands. We didn’t know if this was needed but it was certainly an area to investigate. What Andrew found is that we were not as efficient as we could be in using this latch. Therefore, he created a concept of “page batching” where all the worker threads doing the reading for CHECKDB could read more pages at a time and hold the latch less frequently and for less time. The result was a big breakthrough. Our non-page latch count when down to the hundreds now. Our SQL Server: Buffer Manager – Readahead pages/sec went up. And most importantly, our Disk Read Bytes/sec was very close to BACKUP. The overall time to run CHECKDB WITH PHYSICAL_ONLY is not the same as BACKUP because CHECKDB has other things it must perform, namely CHECKALLOC, building a “fact table”, actually checking the page consistency, and of course building up and presenting results. Furthermore, BACKUP is designed to read in large chunks of the database file and buffer this in its own memory, while CHECKDB uses standard BPool Read-ahead algorithms. Trying to change that violated our goals and boundaries we established up front.

The overall result of this work for our test SAP SQL database was very good and to a point where I felt we had the necessary changes that would be of value to customers. We started with a 4.3TB SAP SQL database in which CHECKDB WITH PHYSICAL_ONLY took well over 13 hours. With Andrew’s “object batching” and “page batching” changes we were now down to 5 hours and 57 minutes (BTW. BACKUP ‘NUL’ for this db was 4 hours 35 mins).

I was very satisfied with these results so felt we were ready to get these changes to customers. Before just moving these into the Cumulative Update cycle, we wanted some “live” customer testing. So we asked Cameron Gardiner to have some of his largest customers test a private build with these changes. He chose one of his customers with a 14TB SAP SQL database. The result was staggering.  Before our private build, CHECKDB WITH PHYSICAL_ONLY took over a day. Now with our change, it ran in 8 hours!

With these tests completed, we moved into the cycle of getting this pushed into Cumulative Updates for SQL Server 2008R2 and SQL Server 2008. The result of this work was a 2008 R2 CU release back in December of 2011. The SQL Server 2008 CU updates will be in the next release due here in March of 2012. In addition, Andrew baked in the “page batching” changes into SQL Server 2012 as the default leaving trace flag 2562 to enable “object batching” (since this results in a larger tempdb).

There is no question we optimized and tested these changes for the “large” (+1TB) SQL Server user. But I think you might find it useful for smaller databases. My caution to you is that we push the hardware resources much harder so these changes are more for users who run CHECKDB “offline” (for example on a backup server).

There are also some factors that can affect whether these changes actually help make your execution of CHECKDB faster:

  • maxdop for CHECKDB – The  more worker threads CHECKDB can use the faster we will read the pages.  Therefore, machines where SQL Server uses more cores can expect better performance
  • optimized I/O – We push the disk harder for reads but if you have a slow disk or bottleneck in the I/O system, these changes won’t help you any.
  • Memory – We use the SQL Buffer Pool to read pages. The less memory to read them in, the slower the performance. This however, is not as significant a factor as maxdop and I/O.
  • Tempdb performance – CHECKDB has to use tempdb for fact tables and aggregation of results. If tempdb I/O is slow, it can also slow down the performance of CHECKDB.
  • Spreading files across disks – As with the general performance of server queries, if we are spreading out our reads across separate physical disks for files, we can achieve better I/O performance.

This was a complete team effort at Microsoft to make these changes happen. From SQLCAT, to our labs, to developers, and our release services team. Many people contributed to what may seem like a small improvement but something that will help customers with very large Tier-1 databases.

Bob Ward
Microsoft

SQL Server 2012 - True Black Box Recorder

$
0
0

This would be a perfect time to post a blog talking about the new SQL Server 2012 features.  However, I am going to leave that activity to the marketing folks (AlwaysOn, T-SQL Enhancements, …).   I want to talk about something that might not appear on the top of a marketing or sales checklist but for any of us that have "carried the pager" (showing my age ad experience now) I truly feel SQL Server 2012 has provided the first, real Black Box Recorder to help us.

I have the unique opportunity to be involved with product planning/development cycles providing feedback and analysis on improving support/maintenance and other such aspects of SQL Server to the SQL Server development team.  One of these efforts started several years ago.  It started under an effort to improve cluster failover diagnostics and capabilities.  A team of us gathered up dozens of scenarios that impacts the folks 'carrying the pager'.  Why did the system stop responding? Why did it failover?  Was it blocking? Was it I/O bottleneck? What is ######?

The team performed ranking exercises and finally settled on 32 scenarios we felt we could attack, eliminate or significantly change the impact and troubleshooting in a quality and repeatable fashion with very small or no impact on performance.   During this exercise I was able to convince, with very little effort I might add, the team to extend the behavior to all SKUs and installations (except SQL Express).  The 32 scenarios are not unique to clustering and provide a black box for support and administrators on any instance of SQL Server.

In the past we have added the default trace (MSSQL\LOG\*.TRC) but it was limited.   We introduced MDW but you had to set that up on the instances you desired to monitor.  We wanted default tracing, high speed, containing the information that is important to understanding the system state currently and historically.  Many of these techniques relied on external processes attempting to query various DMVs and states for the SQL Server instance(s).  They are all helpful but SQL Server knows about its own health state and we wanted to remove dependencies such as a live connection to the SQL Server because a simple networking issue caused loss of diagnostics.

The design landed on an extension of the internal monitoring behaviors in SQL Server.  The same team (SQLOS) that added Scheduler Monitor, Resource Monitor, XEvent's and other facilities designed sp_server_diagnostics.

sp_server_diagnostics  (Books Online Reference: http://msdn.microsoft.com/en-us/library/ff878233(v=sql.110).aspx)

This is a new, internal procedure, implemented on a preemptive thread at high priority.  It accepts the interval parameter, which is uses to output a result set on the interval. It is designed to have its own reserved memory, allocated one time and it avoids locking and synchronization objects so it will provide output even when the SQL Server has encountered a problem.

I spent lots of time testing the stability and capabilities of sp_server_diagnostics.  You can ask the SQL Server development team because they had to fix several issues they never dreamed I would even attempt - but I did Smile

  • I setup a USB drive on my laptop.
  • On the laptop I started multiple copies of SQLIOSim against the USD drive.
  • On my desktop I created a database, over SMB 2.0, on that USB drive.
  • On my desktop I started stress against the instance (Bulk Insert, sorts, hashes, index builds…).
  • On my desktop I used utilities such as Consume.exe to exhaust non/page-pool from Windows.
  • On my desktop I forced max server memory to the minimum limit.
  • SQL Server was returning out of memory errors, I/O delays, etc…
  • Even a command prompt on the desktop was not redrawing properly because I had the system low on desktop heap as well.

The default traces supported by XEvent and sp_server_diagnostics kept running properly, showing me things like memory pressure, blocking, I/O delays and other details.

The design is implemented on an internal based, preemptive, high priority thread.   It actually uses XEvent sessions (internally) to help monitor various aspects of the SQL Server service.  This allows us to avoid issues such as a client network connection and just write the data directly to the XEvent files in the LOG directory.

  • System - Collects data from a system perspective on spinlocks, severe processing conditions, non-yielding tasks, page faults, and CPU usage.
  • Resource - Collects data from a resource perspective on physical and virtual memory, buffer pools, pages, cache and other memory objects.
  • Query Processing - Collects data from a query-processing perspective on the worker threads, tasks, wait types, CPU intensive sessions, and blocking tasks.
  • IO Subsystem - Collects data on IO. In addition to diagnostic data.
  • Events - Collects data and surfaces through the stored procedure on the errors and events of interest recorded by the server, including details about ring buffer exceptions, ring buffer events about memory broker, out of memory, scheduler monitor, buffer pool, spinlocks, security, and connectivity .

Each of these categories is output as a row in the result set returned from sp_server_diagnostics on/at the designated interval.  A key here is that once sp_server_diagnostics is started it will keep returning information on the indicated interval.  A consumer of sp_server_diagnostics just keeps getting a new result set.  Let me explain this in more detail.

Prior to SQL Server 2012 the cluster failover design was to execute a select @@SERVERNAME query ~30 seconds.  If this failed we did a couple of retries and then returned an error condition to the cluster manager.  Each time the query had to be parsed, complied and executed, etc…  Lots of moving parts, memory allocations, execution behaviors and other things that could go wrong and it left very little information for why it made the failure decision.

For a clustered instance or availability group (AlwaysOn) AG resource, the SQL Server 2012 design is to establish a connection at the time the cluster online activity occurs (separate/dedicated thread in the cluster manager) and retrain the connection until the offline condition is met.  This connection issues the sp_server_diagnostics request with a default of 5 second interval.  (See the custom cluster properties to adjust the interval but anything less than 5 seconds is not recommended!)

The server returns a result set on/at every interval.  This information is then used to determine the health state of the SQL Server instance, using the new failover health designation to determine the failover behavior.  The failover designation (http://msdn.microsoft.com/en-us/library/ff878664(v=sql.110).aspx Failover Policy - HealthCheckTimeout, …) allows you to determine a level of failover for a failover instance or a specific availability group. You can elect to failover only if System returns and error or perhaps you need a more sensitive failover so you elect to failover if any of the states returns an error.

This is a really great improvement as the administrator has more control over the failover but I mentioned the black box capability.   All the result information is recorded in a set of XEvent, rollover files in your LOG directory.  This includes the results from the sp_server_diagnostics as well as the cluster, resource DLL decisions.  Furthermore, the SQL Server instance always starts a default XEvent trace, also stored in your LOG directory that records the same result set information along with additional XEvent's.

Any instance of SQL Server (except SQL Express) has the default trace in the LOG directory and those instances clustered or AlwaysOn enabled contain additional information about failover decisions made by the cluster resource DLL.

I feel like I am doing in infomercial now.  Yet, that is not all!  Any instance that has an availability group also starts a Availability Group black box trace, also placed in the LOG directory, which records AG specific activities (such as DDL changes).

Using SQL Server Management Studio you can open any of the .XEL files or better yet you can use the XEL Merge capabilities to open multiple files, sort by timestamp and see what happened.  The targets for the collection will provide you with hours and in some cases days of historical information.   You can now answer the 6pm question as to why the server was slow at 4pm because you can see memory, blocking basics, I/O, … or other health states.

Anyone Can Execute (Limit how many of these are active)

Any user, with appropriate permissions, can execute sp_server_diagnostics.  I wanted this for support so we could collect and trigger additional diagnostics, for example triggering a mini-dump.   While you can run this procedure you want to keep the number of these limited to avoid any impact on performance.

Alter Server Configuration

The log files can be controlled with ALTER SERVER CONFIGURATION … DIAGNOSTICS LOG T-SQL command.

Example Output

18:20.7

system

1

clean

<system spinlockBackoffs="0" sickSpinlockType="none" sickSpinlockTypeAfterAv="none" latchWarnings="0" isAccessViolationOccurred="0" writeAccessViolationCount="0" totalDumpRequests="0" intervalDumpRequests="0" nonYieldingTasksReported="0" pageFaults="0" systemCpuUtilization="12" sqlCpuUtilization="0"/>

18:20.7

resource

1

clean

<resource lastNotification="RESOURCE_NOINFO" outOfMemoryExceptions="0" isAnyPoolOutOfMemory="0" systemOutOfMemoryPeriod="0"><memoryReport name="Memory Manager" unit="KB"><entry description="VM Reserved" value="2204440"/><entry description="VM Committed" v

18:20.7

query_processing

1

clean

<queryProcessing maxWorkers="512" workersCreated="517" workersIdle="493" tasksCompletedWithinInterval="9" pendingTasks="0" oldestPendingTaskWaitingTime="0" hasUnresolvableDeadlockOccurred="0" hasDeadlockedSchedulersOccurred="0" trackingNonYieldingScheduler="0x0"><topWaits><nonPreemptive><byCount><wait waitType="ASYNC_NETWORK_IO" waits="1293727" averageWaitTime="24668922" maxWaitTime="583"/><wait waitType="THREADPOOL" waits="5286" averageWaitTime="4522428" maxWaitTime="7982"/><wait waitType="SNI_CRITICAL_SECTION" waits="498" averageWaitTime="311" maxWaitTime="60"/><wait waitType="CMEMTHREAD" waits="314" averageWaitTime="294" maxWaitTime="19"/><wait waitType="PAGEIOLATCH_SH" waits="160" averageWaitTime="613" maxWaitTime="44"/><wait waitType="IO_COMPLETION" waits="131" averageWaitTime="214" maxWaitTime="40"/><wait waitType="WRITE_COMPLETION" waits="60" averageWaitTime="274" maxWaitTime="13"/><wait waitType="PAGEIOLATCH_EX" waits="22" averageWaitTime="108" maxWaitTime="31"/><wait waitType="SLEEP_BPOOL_FLUSH" waits="12" averageWaitTime="64" maxWaitTime="23"/><wait waitType="WRITELOG" waits="9" averageWaitTime="41" maxWaitTime="10"/></byCount><byDuration><wait waitType="ASYNC_NETWORK_IO" waits="1293727" averageWaitTime="24668922" maxWaitTime="583"/><wait waitType="THREADPOOL" waits="5286" averageWaitTime="4522428" maxWaitTime="7982"/><wait waitType="LCK_M_S" waits="1" averageWaitTime="614" maxWaitTime="614"/><wait waitType="PAGEIOLATCH_SH" waits="160" averageWaitTime="613" maxWaitTime="44"/><wait waitType="SLEEP_TEMPDBSTARTUP" waits="1" averageWaitTime="451" maxWaitTime="451"/><wait waitType="SNI_CRITICAL_SECTION" waits="498" averageWaitTime="311" maxWaitTime="60"/><wait waitType="CMEMTHREAD" waits="314" averageWaitTime="294" maxWaitTime="19"/><wait waitType="WRITE_COMPLETION" waits="60" averageWaitTime="274" maxWaitTime="13"/><wait waitType="IO_COMPLETION" waits="131" averageWaitTime="214" maxWaitTime="40"/><wait waitType="ASYNC_IO_COMPLETION" waits="2" averageWaitTime="181" maxWaitTime="166"/></byDuration></nonPreemptive><preemptive><byCount><wait waitType="PREEMPTIVE_OS_ENCRYPTMESSAGE" waits="1303246" averageWaitTime="120319807" maxWaitTime="982"/><wait waitType="PREEMPTIVE_OS_DECRYPTMESSAGE" waits="200037" averageWaitTime="1613" maxWaitTime="9"/><wait waitType="PREEMPTIVE_OS_AUTHENTICATIONOPS" waits="15430" averageWaitTime="125967" maxWaitTime="336"/><wait waitType="PREEMPTIVE_OS_DELETESECURITYCONTEXT" waits="3439" averageWaitTime="71074" maxWaitTime="287"/><wait waitType="PREEMPTIVE_OS_DISCONNECTNAMEDPIPE" waits="3439" averageWaitTime="84002" maxWaitTime="284"/><wait waitType="PREEMPTIVE_OS_AUTHORIZATIONOPS" waits="2541" averageWaitTime="27749" maxWaitTime="331"/><wait waitType="PREEMPTIVE_OS_LOOKUPACCOUNTSID" waits="2528" averageWaitTime="24563" maxWaitTime="158"/><wait waitType="PREEMPTIVE_OS_REVERTTOSELF" waits="2527" averageWaitTime="19438" maxWaitTime="176"/><wait waitType="PREEMPTIVE_OS_WAITFORSINGLEOBJECT" waits="1298" averageWaitTime="4055" maxWaitTime="191"/><wait waitType="PREEMPTIVE_XE_CALLBACKEXECUTE" waits="786" averageWaitTime="130" maxWaitTime="121"/></byCount><byDuration><wait waitType="PREEMPTIVE_OS_ENCRYPTMESSAGE" waits="1303246" averageWaitTime="120319807" maxWaitTime="982"/><wait waitType="PREEMPTIVE_OS_AUTHENTICATIONOPS" waits="15430" averageWaitTime="125967" maxWaitTime="336"/><wait waitType="PREEMPTIVE_OS_DISCONNECTNAMEDPIPE" waits="3439" averageWaitTime="84002" maxWaitTime="284"/><wait waitType="PREEMPTIVE_OS_DELETESECURITYCONTEXT" waits="3439" averageWaitTime="71074" maxWaitTime="287"/><wait waitType="PREEMPTIVE_OS_AUTHORIZATIONOPS" waits="2541" averageWaitTime="27749" maxWaitTime="331"/><wait waitType="PREEMPTIVE_OS_LOOKUPACCOUNTSID" waits="2528" averageWaitTime="24563" maxWaitTime="158"/><wait waitType="PREEMPTIVE_OS_REVERTTOSELF" waits="2527" averageWaitTime="19438" maxWaitTime="176"/><wait waitType="PREEMPTIVE_OS_WAITFORSINGLEOBJECT" waits="1298" averageWaitTime="4055" maxWaitTime="191"/><wait waitType="PREEMPTIVE_OS_DECRYPTMESSAGE" waits="200037" averageWaitTime="1613" maxWaitTime="9"/><wait waitType="PREEMPTIVE_OS_DOMAINSERVICESOPS" waits="1" averageWaitTime="1029" maxWaitTime="1029"/></byDuration></preemptive></topWaits><cpuIntensiveRequests><request sessionId="53" requestId="0" command="EXECUTE" taskAddress="0xf6e98aa8" cpuUtilization="0" cpuTimeMs="734"/></cpuIntensiveRequests><pendingTasks></pendingTasks><blockingTasks></blockingTasks></queryProcessing>

18:20.7

io_subsystem

1

clean

<ioSubsystem ioLatchTimeouts="0" intervalLongIos="0" totalLongIos="0"><longestPendingRequests></longestPendingRequests></ioSubsystem>

18:20.7

events

0

unknown

<events><session startTime="2010-02-25T09:16:35.693" droppedEvents="0" largestDroppedEvent="0"><RingBufferTarget truncated="0" eventsPerSec="133214" processingTime="14" totalEventsProcessed="1865" eventCount="45" droppedCount="0" memoryUsed="32582"><event

 

RTM Issue

There is one issue with this in RTM that is scheduled to be corrected in SQL Server 2012 Service Pack 1 that Mike Wachal has outlined: http://blogs.msdn.com/b/extended_events/archive/2012/02/17/issues-with-the-system-health-session-in-sql-server-2012.aspx (Issues with the system_health session in SQL Server 2012)

Bob Dorr - Principal SQL Server Escalation Engineer

Intro to Debugging a Memory Dump

$
0
0

I was discussing debugging with some folk internally that didn’t really have much exposure to it, but wanted to learn.  I considered the items pretty basic and didn’t really dive into to much, but I had a few comments that the information was good.  One thing I found that was interesting was that I did some searching around, and couldn’t really find a good reference that summed up the items below.  I found some information on the individual items, but I knew what to go look for.  From the perspective of someone just getting started, it wasn’t obvious.  So, I figured I would share out what I put together as it may be helpful for someone else.

NOTE:  All examples below are using the public Debugger along with public symbols and extensions.  These are available to everyone.

Intro to the Intro

There are some concepts that go along with Debugging that are sometimes not addressed directly when we look at the topic of debugging itself.  These concepts are extremely helpful when we start going through dumps and understand how to connect the dots.  Foundational knowledge really helps in this complex topic.  I will try to add some public references to some items you can read up on, but this shouldn’t be where you stop.  If you really are interested in this topic, there is a wealth of information out there that can help with some of the background. 

A good place to start is the Windows Internals book by David Solomon and Mark Russinovich.  Specifically the chapter on Memory Management and how this works. When talking about Dumps and Debugging, we are working with the contents of Memory. Understanding how memory works is extremely helpful.  Note:  Volume 6 Part 1 was just recently release, but it looks like the Memory Management pieces for Volume 6 will be in Part 2. 

Having some development experience is also helpful.  While you may not need to look at Code directly in a dump, you are looking at the results of code.  The concept of Pointers is sometimes hard for someone to grasp that doesn’t necessarily have a programming background, but when dealing with a memory dump, or debugging, the understanding of how Pointers work can be very helpful.  Even when we are debugging a Managed (.NET) application instead of a Native (C++ or non-managed) Application.  Another good book is Windows via C/C++ by Jeffrey Rickter and Christophe Nasarre. 

The last book I will mention is Debugging Applications by John Robbins.  It should be obvious why I’m recommending it – having to do with Debugging and all.

The above books are not for the feint of heart, but do provide a lot of great information.  From a support perspective, these are two books you see on most people’s bookshelves.  I definitely recommend them for yours and will really help with regards to this topic.

Concepts

Here are some links to some core concepts that I will talk about below.  These can be general pointers to help explain some of the items I talk about below.

Process & Threads

Memory Management

What is a Dump?

A dump is basically the contents of Memory written out to a file.  The contents of this can vary depending on how the dump was generated.  A Kernel Dump is a dump of Windows itself, including all applications running on the system.  A user mode dump is a dump created for a specific process (i.e. an application like Notepad).  Think of a Memory dump as a snapshot of that application.  You can then poke around and see what was in Memory at that point in time.

Are there different kinds of dumps?

When I work with a dump, there are really two kinds that I come in contact with. 

A full dump is everything in memory for that process.  This includes any modules that are loaded, Handle Tables, Thread stacks and other information that is application specific.  A mini dump only includes selected parts of the process and can be set with options when the dump is created.  Have a look at the .dump command to see what some of those options are.

Applications in general can also capture dumps and include/exclude specific regions of memory.  SQL Server is a great example of this.  They have another dump option that we refer to as a Filtered Dump.  A filtered dump is really a Full dump but it excludes the memory region for the Buffer Pool.  When we want a full dump, if we included the Buffer Pool, this could be really huge.  Think of a 64GB box that SQL is running on and has been active for a while.  The Buffer Pool is probably pretty large.  If we did a Full Dump on SQL, that dump size would probably be close to 64GB, at least I would hope it would be (depending on what you configured Max Server Memory to).  However, if we grab a Filtered Dump, this dump may only be 1-3GB in size.  That’s a big difference!  And really, we don’t usually even care about what is in the Buffer Pool when looking at a dump.

You can also have a Crash Dump or a Hang dump.  A Crash dump is a dump triggered by an error (or Exception).  A Hang dump is a dump you manually invoke a dump.  This is great to see if what is going on if we are encountering a hang in the application, or you just want to poke around.

Working with Threads

Moving between threads is important in a dump.  When you open a dump, if it is a hang dump, it will usually open on thread 0, although this isn’t a guarantee.  A Crash dump will typically open on the Thread where the exception was thrown, which is helpful.  You can move around to other threads though.  You can use the ~ command to see what threads we have out there.

0:013> ~
   0  Id: cfc.1104 Suspend: 1 Teb: 7efdd000 Unfrozen
   1  Id: cfc.12a0 Suspend: 1 Teb: 7efd7000 Unfrozen
   2  Id: cfc.e50 Suspend: 1 Teb: 7efaf000 Unfrozen
   3  Id: cfc.12e8 Suspend: 1 Teb: 7efac000 Unfrozen
   4  Id: cfc.1344 Suspend: 1 Teb: 7efa9000 Unfrozen
   5  Id: cfc.798 Suspend: 1 Teb: 7efa6000 Unfrozen
   6  Id: cfc.1054 Suspend: 1 Teb: 7efa3000 Unfrozen
   7  Id: cfc.1214 Suspend: 1 Teb: 7ef3f000 Unfrozen
   8  Id: cfc.134c Suspend: 1 Teb: 7ef3c000 Unfrozen
   9  Id: cfc.fb4 Suspend: 1 Teb: 7ef39000 Unfrozen
  10  Id: cfc.13a4 Suspend: 1 Teb: 7ef33000 Unfrozen
  11  Id: cfc.113c Suspend: 1 Teb: 7ef36000 Unfrozen
  12  Id: cfc.11a4 Suspend: 1 Teb: 7ef30000 Unfrozen
. 13  Id: cfc.e40 Suspend: 1 Teb: 7efda000 Unfrozen

You’ll notice the “0:013>” on the command.  The 13 here indicates the current thread we are on from a context point of view.  So, if we a command like k, it is within the context of Thread 13.  We can then switch to a different thread by doing ~<thread #>s.

0:013> ~5s
eax=c0000034 ebx=00000001 ecx=00000000 edx=00000000 esi=00000000 edi=00000000
eip=77ce9bd5 esp=06b8f194 ebp=06b8f22c iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
ntdll!ZwWaitForMultipleObjects+0x15:
77ce9bd5 c21400          ret     14h

Whenever you switch to a new thread, you will see the register information. You will then be in the context of that thread (5 in this case).

image

 

Debuggers

There are different ways to debug a dump.  The main tool that I use to review a dump is WinDBG.  WinDBG is part of the Debugging Tools for Windows and is currently part of the Windows SDK.  Although, you can just choose to download the Debugging Tools from the options of the Windows SDK setup wizard.  This will install both the x64 and x86 version of WinDBG.  I always have both on my system.  If it is a 32bit process, I use the x86 version of WinDBG.  If it is a 64bit process, I use the x64 version of WinDBG. 

You can technically use the x64 version of WinDBG with a 32bit process and make use of the .effmach command, but this won’t work properly with managed dumps.  And, I’ve seen other issues with using this.  I just stick to the same bitness when matching Debugger with Process to be safe.

DebugDiag is a great tool to capture a dump, and also offers some Analyzers that may help to point something out, but doesn’t let you freely debug a dump.  CDB is also a debugger that comes with the Debugging Tools for Windows that you can use.  Also, KD for kernel debugging.

Also, don’t forget about Visual Studio.  VS is a great Debugger.  Maybe not with a raw dump, but can help with live debugging an application you may have and give you insight.

Symbols

http://msdn.microsoft.com/en-us/library/windows/desktop/ee416588(v=vs.85).aspx

For our purposes we are referring to PDB files that are generated when compiling an application.  There are older formats of debugging information, but we will focus on the PDB Format.  You may have PDB’s for your own application.  For Microsoft Application, we have a public symbol server (http://msdl.microsoft.com/download/symbols) that you can use to get public symbols.  You may be asking why I care about this.  Here is a thread stack with any symbols loaded.

0:054> k
Child-SP          RetAddr           Call Site
00000000`183cf118 000007fa`5376db54 ntdll!ZwSignalAndWaitForSingleObject+0xa
00000000`183cf120 00000000`00ae4b08 KERNELBASE!SignalObjectAndWait+0xc8
00000000`183cf1d0 00000000`00ae2d76 sqlservr+0x4b08
00000000`183cf920 00000000`00ae2700 sqlservr+0x2d76
00000000`183cf960 00000000`00aeae8d sqlservr+0x2700

Here is the same stack after getting symbols loaded for SQL Server from the public symbol server.

0:054> k
Child-SP          RetAddr           Call Site
00000000`183cf118 000007fa`5376db54 ntdll!NtSignalAndWaitForSingleObject+0xa
00000000`183cf120 00000000`00ae4b08 KERNELBASE!SignalObjectAndWait+0xc8
00000000`183cf1d0 00000000`00ae2d76 sqlservr!SOS_Scheduler::Switch+0x181
00000000`183cf920 00000000`00ae2700 sqlservr!SOS_Scheduler::SuspendNonPreemptive+0xca
00000000`183cf960 00000000`00aeae8d sqlservr!SOS_Scheduler::Suspend+0x2d

 

You may notice, after you install the debugger, that you don’t have any symbols.  You may also see the following notice when you open a dump.

Symbol search path is: *** Invalid ***
****************************************************************************
* Symbol loading may be unreliable without a symbol search path.           *
* Use .symfix to have the debugger choose a symbol path.                   *
* After setting your symbol path, use .reload to refresh symbol locations. *
****************************************************************************

It mentions the .symfix comment.  Here is what happens.  We can look at the Symbol path by using the .sympath command.  It will be empty.  Here is what we see when doing the .symfix.

0:054> .symfix
0:054> .sympath
Symbol search path is: srv*
Expanded Symbol search path is: cache*;SRV*http://msdl.microsoft.com/download/symbols

It will add in the default symbol path, which is the Microsoft Public Symbol Server.  You can then further modify it by issuing another .sympath command.  For example:

0:054> .sympath SRV*c:\symbols\public\*http://msdl.microsoft.com/download/symbols;c:\symbols\custom
Symbol search path is: SRV*c:\symbols\public\*http://msdl.microsoft.com/download/symbols;c:\symbols\custom
Expanded Symbol search path is: srv*c:\symbols\public\*http://msdl.microsoft.com/download/symbols;c:\symbols\custom

This means that I will cache any symbols from the public symbol server into c:\symbols\public and any Symbols I also want to use, I can stick in my custom directory.

You can also set an System Environment Variable with your Symbol path that other applications can use as well, such as Visual Studio.  It is _NT_SYMBOL_PATH.

There is also a difference between Private and Public symbols.  When generating symbols for your application, you can strip out certain information.  The Public symbols don’t contain all the information that internal (private) symbols provide.  The article I referenced above for symbols goes into this if you really want to understand it.

Debugger Extensions

Debugging Extensions can be really helpful to get information out of a dump with minimal effort as compared to finding it by hand.  Extensions essentially contain pre-made commands to get specific information.  The debugger comes with some extensions already.  You can see some of them by using the .chain command to see which ones are loaded.

0:054> .chain
Extension DLL search Path:
    C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\WINXP;C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\winext;C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\winext\arcade;C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\pri;C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64;C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\winext\arcade;C:\Program Files\Microsoft Passport RPS\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL Server\100\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\;C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn\;C:\Program Files\Microsoft Network Monitor 3\
Extension DLL chain:
    dbghelp: image 6.2.8229.0, API 6.2.6, built Thu Feb 09 22:51:14 2012
        [path: C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\dbghelp.dll]
    ext: image 6.2.8229.0, API 1.0.0, built Thu Feb 09 22:56:01 2012
        [path: C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\winext\ext.dll]
    exts: image 6.2.8229.0, API 1.0.0, built Thu Feb 09 23:05:32 2012
        [path: C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\WINXP\exts.dll]
    uext: image 6.2.8229.0, API 1.0.0, built Thu Feb 09 22:59:16 2012
        [path: C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\winext\uext.dll]
    ntsdexts: image 6.2.8229.0, API 1.0.0, built Thu Feb 09 22:59:54 2012
        [path: C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\WINXP\ntsdexts.dll]

The Debugger Extension that I use the most is the Managed Code Extension.  I use an internal version that takes advantage of private symbols, but there is a public version that ships with the .NET Framework called SOS.  You can load this Extension by using the following command.

0:069> .loadby sos mscorwks

We can see that it is loaded by looking at .chain again.

0:069> .chain
Extension DLL search Path:
    C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\WINXP;C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\winext;C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\winext\arcade;C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\pri;C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64;C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\winext\arcade;C:\Program Files\Microsoft Passport RPS\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL Server\100\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\;C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn\;C:\Program Files\Microsoft Network Monitor 3\
Extension DLL chain:
    C:\Windows\Microsoft.NET\Framework64\v2.0.50727\sos: image 2.0.50727.4216, API 1.0.0, built Thu Jul 07 00:47:51 2011
        [path: C:\Windows\Microsoft.NET\Framework64\v2.0.50727\sos.dll]
    dbghelp: image 6.2.8229.0, API 6.2.6, built Thu Feb 09 22:51:14 2012
        [path: C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\dbghelp.dll]
    ext: image 6.2.8229.0, API 1.0.0, built Thu Feb 09 22:56:01 2012
        [path: C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\winext\ext.dll]
    exts: image 6.2.8229.0, API 1.0.0, built Thu Feb 09 23:05:32 2012
        [path: C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\WINXP\exts.dll]
    uext: image 6.2.8229.0, API 1.0.0, built Thu Feb 09 22:59:16 2012
        [path: C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\winext\uext.dll]
    ntsdexts: image 6.2.8229.0, API 1.0.0, built Thu Feb 09 22:59:54 2012
        [path: C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64\WINXP\ntsdexts.dll]

 

You can also run .help on an extension to see what some of the commands are that are available to you.  Assuming that that extension produces results.  Also note, that for any command on an extension, you need to prefix it with an exclamation mark (!) or a period (.).  Here is the beginning of the Help output for the SOS Extension.

0:069> !sos.help
-------------------------------------------------------------------------------
SOS is a debugger extension DLL designed to aid in the debugging of managed
programs. Functions are listed by category, then roughly in order of
importance. Shortcut names for popular functions are listed in parenthesis.
Type "!help <functionname>" for detailed info on that function.

Object Inspection                  Examining code and stacks
-----------------------------      -----------------------------
DumpObj (do)                       Threads
DumpArray (da)                     CLRStack
DumpStackObjects (dso)             IP2MD
DumpHeap                           U
DumpVC                             DumpStack
GCRoot                             EEStack
ObjSize                            GCInfo
FinalizeQueue                      EHInfo
PrintException (pe)                COMState
TraverseHeap                       BPMD

While there are some internal extensions that I can use, that doesn’t mean that you can’t get a lot out of a dump.  The reason the Extensions I use are internal is that they have a dependency on Internal Symbols.

Basic Commands

There are some common commands that you can use to poke around and discover what is in a dump.  I’ll talk about a few here, but if you really want to go through different commands that are available, check out the help file that comes with the debugger.  I know that sounds like I’m signaling defeat by reading the manual, but it seriously has some good info with some good examples.  I often refer back to the Debugger Help to look up a command.  If you don’t use it all the time, you sometimes have to go back and refresh your memory.  Especially when it comes to dumps.

k

The k command is used to generate a stack dump on a given thread.  There are also variants that you can use such as kp or kn, or even knp.  The debugger help outlines all of these modifiers you can use.  Lets look at the example I had above.

0:054> k
Child-SP          RetAddr           Call Site
00000000`183cf118 000007fa`5376db54 ntdll!NtSignalAndWaitForSingleObject+0xa
00000000`183cf120 00000000`00ae4b08 KERNELBASE!SignalObjectAndWait+0xc8
00000000`183cf1d0 00000000`00ae2d76 sqlservr!SOS_Scheduler::Switch+0x181
00000000`183cf920 00000000`00ae2700 sqlservr!SOS_Scheduler::SuspendNonPreemptive+0xca
00000000`183cf960 00000000`00aeae8d sqlservr!SOS_Scheduler::Suspend+0x2d
00000000`183cf990 00000000`00aeaf16 sqlservr!WorkDispatcher::DequeueTask+0x472
00000000`183cfa20 00000000`00c244fa sqlservr!SOS_Scheduler::ProcessTasks+0x76
00000000`183cfa90 00000000`00c247dd sqlservr!SchedulerManager::WorkerEntryPoint+0x2d2
00000000`183cfb70 00000000`0106c0cd sqlservr!SystemThread::RunWorker+0xcc
00000000`183cfbb0 00000000`00c253d2 sqlservr!SystemThreadDispatcher::ProcessWorker+0x2db
00000000`183cfc60 00000000`77c337d7 sqlservr!SchedulerManager::ThreadEntryPoint+0x173
00000000`183cfd00 00000000`77c33894 msvcr80!_callthreadstartex+0x17
[f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c @ 348]
00000000`183cfd30 000007fa`5444298e msvcr80!_threadstartex+0x84 [f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c @ 326]
00000000`183cfd60 000007fa`5660e229 kernel32!BaseThreadInitThunk+0x1a
00000000`183cfd90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

You can see where the thread started and all of the calls that were made until we got to the Wait.  The Top is the most recent call, so it reads from the bottom and work up.  Notice also that the msvcr80 calls have source code information.  One thing I do to try and clean up stack calls is to use the upper case L with the k command to omit that. 

0:054> kL
Child-SP          RetAddr           Call Site
00000000`183cf118 000007fa`5376db54 ntdll!NtSignalAndWaitForSingleObject+0xa
00000000`183cf120 00000000`00ae4b08 KERNELBASE!SignalObjectAndWait+0xc8
00000000`183cf1d0 00000000`00ae2d76 sqlservr!SOS_Scheduler::Switch+0x181
00000000`183cf920 00000000`00ae2700 sqlservr!SOS_Scheduler::SuspendNonPreemptive+0xca
00000000`183cf960 00000000`00aeae8d sqlservr!SOS_Scheduler::Suspend+0x2d
00000000`183cf990 00000000`00aeaf16 sqlservr!WorkDispatcher::DequeueTask+0x472
00000000`183cfa20 00000000`00c244fa sqlservr!SOS_Scheduler::ProcessTasks+0x76
00000000`183cfa90 00000000`00c247dd sqlservr!SchedulerManager::WorkerEntryPoint+0x2d2
00000000`183cfb70 00000000`0106c0cd sqlservr!SystemThread::RunWorker+0xcc
00000000`183cfbb0 00000000`00c253d2 sqlservr!SystemThreadDispatcher::ProcessWorker+0x2db
00000000`183cfc60 00000000`77c337d7 sqlservr!SchedulerManager::ThreadEntryPoint+0x173
00000000`183cfd00 00000000`77c33894 msvcr80!_callthreadstartex+0x17
<—no source info with the kL command
00000000`183cfd30 000007fa`5444298e msvcr80!_threadstartex+0x84
00000000`183cfd60 000007fa`5660e229 kernel32!BaseThreadInitThunk+0x1a
00000000`183cfd90 00000000`00000000 ntdll!RtlUserThreadStart+0x1d

db, dd, dq, da, dc and du

These commands all display data based on a given address or range. 

db = Display byte values and ASCII characters

dd = Display DWORD

dq = Display QWORD

da = Display ASCII characters

dc = Display DWORD and ASCII characters

du = Display Unicode characters

dt

This lets you dump a type (or structure) so you can see what it looks like.  This really leads into a larger discussion about how to debug in general and understand and working with structures.  In our example above, with the k command, SOS_Scheduler would be an example of a type or structure.  However, you need full, or private, symbols to look at type information.  It is not available with private symbols.  This can be helpful when debugging your own applications though.

!uniqstack

I love this command!  It does two things for me.  First, it can weed out the noise by only showing the unique stacks within the dump.  So, if a given stack is repeated on 40 threads, it only shows it once.  Also, I use it as a warm up to make sure I have all the Symbols I need downloaded because it goes through and processes the call stacks.

Debugging Managed Code

Debugging Managed Code can sometimes be much easier to do than native code.  Mainly due to the SOS debugging extension and what it provides.  On important point to make about Managed Debugging.  You really need a Full dump to take advantage of the SOS extension.  I’ve always kind of joked about Managed Debugging that you can just !do everything and you’ll get there eventually. Although, sometimes that’s not far from the truth.

!do (Dump Object)

!do is one of the main managed commands from SOS.

0:005> !do 02f487d4
Name: System.Data.SqlClient.SqlConnection
MethodTable: 6b66d824
EEClass: 6b58c404
Size: 56(0x38) bytes
(C:\Windows\assembly\GAC_32\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
70b50770  400018a        4        System.Object  0 instance 00000000 __identity
702d5700  40008de        8 ...ponentModel.ISite  0 instance 00000000 site
702ef9f0  40008df        c ....EventHandlerList  0 instance 00000000 events
70b50770  40008dd      108        System.Object  0   static 02cc8ea8 EventDisposed
6ba89940  4000bd6       10 ...hangeEventHandler  0 instance 00000000 _stateChangeEventHandler
6baaef30  400171d       14 ...t.SqlDebugContext  0 instance 00000000 _sdc
70b24620  400171e       30       System.Boolean  1 instance        0 _AsycCommandInProgress
6b670f4c  400171f       18 ...ent.SqlStatistics  0 instance 00000000 _statistics
70b24620  4001720       31       System.Boolean  1 instance        0 _collectstats
70b24620  4001721       32       System.Boolean  1 instance        0 _fireInfoMessageEventOnUserErrors
6b66f4d8  4001724       1c ...ConnectionOptions  0 instance 02f5a1b4 _userConnectionOptions
6b66f018  4001725       20 ...nnectionPoolGroup  0 instance 02f5a784 _poolGroup
6b66f614  4001726       24 ...onnectionInternal  0 instance 02f5b08c _innerConnection
70b52da0  4001727       28         System.Int32  1 instance        0 _closeCount
70b52da0  4001729       2c         System.Int32  1 instance        1 ObjectID
70b50770  400171c      798        System.Object  0   static 02f4880c EventInfoMessage
6b66ee84  4001722      79c ...ConnectionFactory  0   static 02f48818 _connectionFactory
70b5291c  4001723      7a0 ...eAccessPermission  0   static 02f4bf3c ExecutePermission
70b52da0  4001728      880         System.Int32  1   static        5 _objectTypeCount

You can then do another !do and get the connection string.

0:005> !do 02f5a1b4
Name: System.Data.SqlClient.SqlConnectionString
MethodTable: 6b66f48c
EEClass: 6b5a02dc
Size: 112(0x70) bytes
(C:\Windows\assembly\GAC_32\System.Data\2.0.0.0__b77a5c561934e089\System.Data.dll)
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
70b50b54  4000be0        4        System.String  0 instance 02f59f10 _usersConnectionString
70b531a8  4000be1        8 ...ections.Hashtable  0 instance 02f5a420 _parsetable
6b66f6e0  4000be2        c ...mon.NameValuePair  0 instance 02f5a50c KeyChain
70b24620  4000be3       14       System.Boolean  1 instance        0 HasPasswordKeyword
70b24620  4000be4       15       System.Boolean  1 instance        0 UseOdbcRules
70b52818  4000be5       10 ...ity.PermissionSet  0 instance 02f5a8f4 _permissionset
702e419c  4000bdc      1f0 ...Expressions.Regex  0   static 02f4cc84 ConnectionStringValidKeyRegex
702e419c  4000bdd      1f4 ...Expressions.Regex  0   static 02f54970 ConnectionStringValidValueRegex
702e419c  4000bde      1f8 ...Expressions.Regex  0   static 02f56310 ConnectionStringQuoteValueRegex
702e419c  4000bdf      1fc ...Expressions.Regex  0   static 02f57ce8 ConnectionStringQuoteOdbcValueRegex
70b24620  4001760       16       System.Boolean  1 instance        1 _integratedSecurity
70b24620  4001761       17       System.Boolean  1 instance        0 _async
70b24620  4001762       60       System.Boolean  1 instance        1 _connectionReset
70b24620  4001763       61       System.Boolean  1 instance        0 _contextConnection

You can see how you can start going around just !do’ing items to get what you want.  To pull in some of the display commands above, here is a cool trick.  If we look above at the _usersConnectionString, we can see that it is a System.String.  We could do a !do on that and get the following:

0:005> !do 02f59f10
Name: System.String
MethodTable: 70b50b54
EEClass: 7090d65c
Size: 530(0x212) bytes
(C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: server='localhost';Trusted_Connection=true;Application Name='Microsoft SQL Server Management Studio';Pooling=false;Packet Size=4096;multipleactiveresultsets=false
Fields:
      MT    Field   Offset                 Type VT     Attr    Value Name
70b52da0  4000096        4         System.Int32  1 instance      257 m_arrayLength
70b52da0  4000097        8         System.Int32  1 instance      162 m_stringLength

You could also do a dc and see the text as well…  although not as cleanly, but is there.

0:005> dc 02f59f10
02f59f10  70b50b54 00000101 000000a2 00650073  T..p........s.e.
02f59f20  00760072 00720065 0027003d 006f006c  r.v.e.r.=.'.l.o.
02f59f30  00610063 0068006c 0073006f 00270074  c.a.l.h.o.s.t.'.
02f59f40  0054003b 00750072 00740073 00640065  ;.T.r.u.s.t.e.d.
02f59f50  0043005f 006e006f 0065006e 00740063  _.C.o.n.n.e.c.t.
02f59f60  006f0069 003d006e 00720074 00650075  i.o.n.=.t.r.u.e.
02f59f70  0041003b 00700070 0069006c 00610063  ;.A.p.p.l.i.c.a.
02f59f80  00690074 006e006f 004e0020 006d0061  t.i.o.n. .N.a.m.

That is because it is in memory, and these are just different options of looking at a given memory location.  The string is there though.  You see the periods in between the characters because it is actually a Unicode string and each character takes two bytes instead of one.  If you recall, we have a command for dumping Unicode – du.

0:005> du 02f59f10+c
02f59f1c  "server='localhost';Trusted_Conne"
02f59f5c  "ction=true;Application Name='Mic"
02f59f9c  "rosoft SQL Server Management Stu"
02f59fdc  "dio';Pooling=false;Packet Size=4"
02f5a01c  "096;multipleactiveresultsets=fal"
02f5a05c  "se"

You’ll notice the +c.  Had we just gone with the base address we would see the following with the du command.

0:005> du 02f59f10
02f59f10  ".炵ā"

That’s because the string doesn’t actually start at the base address.  There is some information regarding to the structure of System.String.  It starts a little bit further in – 12 bytes to be exact which is C in hex.  That’s where the +c comes from. We are saying grab me the unicode string at the base address+C which we are actually adding 12 bytes to it.  So, it would be the same if we did the following.

0:005> du 02f59f1c
02f59f1c  "server='localhost';Trusted_Conne"
02f59f5c  "ction=true;Application Name='Mic"
02f59f9c  "rosoft SQL Server Management Stu"
02f59fdc  "dio';Pooling=false;Packet Size=4"
02f5a01c  "096;multipleactiveresultsets=fal"
02f5a05c  "se"

You can being to see why using the SOS Extension makes things easier.

!dso (Dump Stack Objects)

This will list the managed objects for the given thread that you are currently on.  Assuming it is a managed thread.  This can be extremely helpful.  It has also helped me to figure out where we are within a given code path based on what objects were present.

0:005> !dso
OS Thread Id: 0xd74 (5)
ESP/REG  Object   Name
06b1f12c 0315e538 System.String    ErrorLogFile
06b1f144 03148088 Microsoft.Win32.SafeHandles.SafeWaitHandle
06b1f18c 0314805c Microsoft.SqlServer.Management.UI.VSIntegration.ObjectExplorer.Service+AsyncWmiBinding
06b1f624 02e94664 System.Runtime.CompilerServices.RuntimeHelpers+TryCode
06b1f628 02e94684 System.Runtime.CompilerServices.RuntimeHelpers+CleanupCode
06b1f62c 03146098 System.Threading.ExecutionContext+ExecutionContextRunData
06b1f668 03148238 System.Threading._ThreadPoolWaitCallback
06b1f680 03148238 System.Threading._ThreadPoolWaitCallback

You can also look at this post for another example, and this one.

!clrstack

This will dump out the Managed Stack for a given thread that you are on.  This is essentially the k command for managed code.

0:005> !clrstack
OS Thread Id: 0xd74 (5)
ESP       EIP    
06b1f0c0 77ce9bd5 [HelperMethodFrame_1OBJ: 06b1f0c0] System.Threading.WaitHandle.WaitOneNative(Microsoft.Win32.SafeHandles.SafeWaitHandle, UInt32, Boolean, Boolean)
06b1f16c 70ad689f System.Threading.WaitHandle.WaitOne(Int64, Boolean)
06b1f188 70ad6855 System.Threading.WaitHandle.WaitOne(Int32, Boolean)
06b1f19c 70ad681d System.Threading.WaitHandle.WaitOne()
06b1f1a4 6f107c90 Microsoft.SqlServer.Management.UI.VSIntegration.ObjectExplorer.Service+AsyncWmiBinding.RequestHandler(System.Object)
06b1f1b0 70ae9f7f System.Threading._ThreadPoolWaitCallback.WaitCallback_Context(System.Object)
06b1f1b8 70b157b1 System.Threading.ExecutionContext.runTryCode(System.Object)
06b1f5e8 71ee1b4c [HelperMethodFrame_PROTECTOBJ: 06b1f5e8] System.Runtime.CompilerServices.RuntimeHelpers.ExecuteCodeWithGuaranteedCleanup(TryCode, CleanupCode, System.Object)
06b1f650 70b156a7 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
06b1f66c 70b002f5 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
06b1f684 70aea4e3 System.Threading._ThreadPoolWaitCallback.PerformWaitCallbackInternal(System.Threading._ThreadPoolWaitCallback)
06b1f698 70aea379 System.Threading._ThreadPoolWaitCallback.PerformWaitCallback(System.Object)
06b1f828 71ee1b4c [GCFrame: 06b1f828]

 

For learning more about Managed Debugging, I’ll point you to Tess’ blog.  That’s pretty much how I learned Managed debugging.  Great stuff over there!

Your Assignment

When I went through this internally, someone at the end asked if there was a Homework assignment.  I hadn’t thought about that, but I replied with the following:

1. Install a debugger

2. Capture a dump of Notepad

3. Make note of the number of threads you have

4. Output the stack of a thread

To help you along, you can use the following command to capture the dump from WinDBG (although you don’t have to use WinDBG to get the dump). after having attached to the Notepad Process:

.dump /ma c:\temp\mydump.dmp

This will capture a full dump.  You can see the command switches here:

.dump (Create Dump File)
http://msdn.microsoft.com/en-us/library/windows/hardware/ff562428(v=vs.85).aspx

Bonus points if you capture the dump using a different method.

 

Happy hunting!

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

SetFileIoOverlappedRange Can Lead to Unexpected Behavior for SQL Server 2008 R2 or SQL Server 2012 (Denali)

$
0
0

You should be aware the Windows bug with API = SetFileIoOverlappedRange.  This is used by SQL Server (2008 R2 and SQL 2012) only when Locked Pages has been enabled and is in use by the SQL Server.

SQL Server Support has NOT encountered this in a production environment.  The issue was found during Denali (SQL 2012) testing.  This is for information purposes only.  WORKAROUND: Disable Locked Pages for the SQL Server instance.

Problem

Calls to the SetFileIoOverlappedRange can corrupt the OVERLAPPED structures of SQL Server.  This can lead to incorrect offsets, wrong timings, early completion, etc…

This is called every time we open an FCB (SQL Server's user mode File Control Block Structure).  For example:

·         Open of current database files   (MDF, NDF, LDF) - standard recovery  

·         Open of new database files (MDF, NDF, LDF)   - create database (Testing shows for a create database this is called 4 times to get everything going)

·         Auto close opens (MDF, NDF, LDF) - use on closed database

·         DBCC Snapshot opens (MDF, NDF)

·         Snapshot database file opens (MDF, NDF)

·         Restore/Log Shipping – When it opens the FCB to do the restore to

*** Backup to disk does NOT use this code path. ***

 

Symptoms

Wide ranging in SQL from invalid write location, lost read or write, early access to a page that is not yet fully in memory, I/O list damage such as AVs, incorrect timing reports, and many others. You may not even see the situation until days later.

 

Bug Id #818757

This is a race condition trigged when using SQL 2008 R2 and (Denali (SQL 2012) before RC1). If you are running with Locked Pages In Memory there is a possibility of corrupting outstanding I/O requests when reads/writes are outstanding AND concurrently another file is being opened.

Unfortunately memory corruption (scribbler behavior) may happen in the different parts of the data structure. In the majority of the cases we’ve seen so far we corrupt I/O queue pointers, thus causing SQL Server instance to produce infinite amount of dumps (looping).

We have also seen in-memory corruption, calls to I/O completion when the I/O is not finished.  Write/read at incorrect offset(s) within the file is also possible but unlikely. 

Windows 8 contains the API fix that is being back ported to previous version(s) of Windows.

 

The SQL Server 2012 and the SQL Server 2008 R2 CU based fixes introduce trace flag 8903.   These builds will not use the API unless the trace flag is enabled.

 

Windows 2008 / Vista

Windows 2008 R2 / Windows 7

(Windows 2008 R2 / Windows 7) + QFE*

Windows 8

SQL 2008 R2

Exposed

Exposed

Fixed

Fixed

SQL 2008 R2 – PCU2**

Fixed

Fixed

Fixed

Fixed

SQL 2008 R2 – PCU2 –T8903 enabled

Exposed

Exposed

Fixed

Fixed

SQL 2012

Fixed

Fixed

Fixed

Fixed

SQL 2012 – T8903 enabled

Exposed

Exposed

Fixed

Fixed

*The Windows fixes are slated to be included in a Windows Update package.  Reference Bug #s 395823 (KB 2679255 with SP2 Target), 395948 (KB 2679255 with SP3 Target),

** The SQL Server issue is fixed in SQL 2012 RTM and with the bug # 870529 for SQL Server 2008 R2 PCU2 release.

API Reference: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365540(v=vs.85).aspx

When the fixes are available I will be updating the matrix above and Microsoft will be creating the proper System Center Advisor rules to detect exposure.

Bob Dorr -  Principal SQL Server Escalation Engineer


AlwaysON - HADRON Learning Series: Automated Failover Behaviors (Denali - Logging History Information, FCI and Default Health Capture, sp_server_diagnostics)

$
0
0

Quite an encompassing title I will agree but the features all work together to build a really nice product.  I had this tucked away during beta and thought it would be helpful to post.

SQL Server AlwaysON and "Denali" has undergone a significant upgrade to the way the cluster resource (for an Availability Group or a Failover Cluster Instance) detects failover conditions.   Instead of the previous behavior of Select @@SERVERNAME, some timeouts and retries, the SQL Server resource DLL executes an internal procedure (sp_server_diagnostics) that returns information on intervals to the resource DLL.   The information involves memory, I/O, query processor, scheduling and other behavior and situational information.  Based on this information and the failover setting (new private property for the SQL Server (Failover Cluster Instance - FCI) or Availability Group (AG) resource the failover decisions are made.

The easiest way to think of this is the Internet Explorer security setting slider.  As you increase the security protection more factors are considered.  The following figure shows an example from the build I have installed.  For each AG or FCI instance the resource properties contain the "FailoverConditionLevel".  The level of detection encompasses the lesser level detections as well.   For a very sensitive AG or instance you set the detection to max (5) or you reduce sensitive to environmental changes by setting the level downward, perhaps (3).

image

sp_server_diagnostics (http://blogs.msdn.com/b/psssql/archive/2012/03/08/sql-server-2012-true-black-box-recorder.aspx)
The resource DLL maintains a persistent connection to the fully preemptive worker stream returned from sp_server_diagnostics.   The figure below points out some of the sp_server_diagnostics behavior.  It is a special note that the information returned from the procedure as well as the failover diagnostics are logged in a series of XEL rollover files.  (\LOG\*Diag*.xel)

Note:  This procedure was not designed for use outside the SQL Server product line.

image

It is also of note that the sp_server_diagnostics information is saved in the default health session for ALL SKUS except SQL Server Express (MSSQL\Log directory).   YIPPEE:  The DBA or CSS can walk up to any Denali server and look at the history of memory, I/O. QP. blocking, etc…for the past ## or hours.  Something we have never had built into any previous version of SQL Server.  Not only can you now answer the question in a clustered (FCI) instance or HADRON/AlwaysOn AG move but on a stand-alone server.

Failover and Availability Modes
SQL Server Always on is designed to allow 3 specific failover models.

  • Automatic Failover - Sync Replica with Automatic Failover
  • High Performance - Async Replica with Manual Failover
  • High Safety - Sync Replica with Manual Failover

Each AG can be configured for different failover condition level tolerances.  The location of the principle and configuration of the replica combine to establish the type of failover for the AG.    Take the following configuration for example.

· A: Synchronous commit with Auto failover

· B: Synchronous commit with Auto failover

· C: Synchronous commit with Manual failover

· D: Asynchronous commit with Manual failover

The behavior for each replica depends on 2 configurations levels and the lowest common setting.  I numbered them 1 and 2 specifically because it works in my example formula well to help describe the matrix.

The following tables outline the failover and availability modes that can be used in the example configuration outlined above.

Failover Mode:  Manual(1) or Automatic(2)

Scenario

A

B

C

D

1

Primary

Sync

Sync

Async

2 

Sync

Primary

Sync

Async

3

Sync

Sync

Primary

Async

4

Async

Async

Async

Primary

 

Availability Mode: Async(1) or Sync (2)

Scenario

A

B

C

D

1

Primary

Automatic

Manual

Manual

2

Automatic

Primary

Manual

Manual

3

Manual

Manual

Primary

Manual

4

Manual

Manual

Manual

Primary

 

I use the following formula: (Take your current primary and determine how the Primary AG is configured and then do the same on the secondary you are evaluating.)

Failover Mode = min( Primary Failover Mode, Secondary Failover Mode)

Availability Mode = min( Primary Avail Mode, Secondary Avail Mode)

Let's start with Primary = A and Secondary = B

min (A=Automatic(2), B=Automatic(2)) = 2= Lowest Failover Mode (Automatic)

min(A=Sync(2), B=Sync(2)) = 2 = Lowest Availability Mode (Sync)

A and B can be Primary and Secondary with Sync / Automatic failover capabilities.

Now, let's look at A and C:

min(A=Automatic(2), C=Manual(1)) = 1 = Lowest Failover Mode (Manual)

min(A=Sync(2), C=Sync(2)) = 2 = Lowest Availability Mode (Sync)

A and C are Manual failover only with sync capabilities.   This means A won't fail to C automatically and C can't fail back to A automatically.  This is a Manual failover mode channel.

You can apply the same rules to any of the combinations.  Find the lowest setting between the Primary and Secondary target and apply it as the behavior outcome.   For example when D becomes the primary the lowest is 1 and 1 or Async with Manual failover across all replica targets.

 

A Failover Clustered Instance of SQL Server can host an availability group but it CAN NOT be set to automatic mode. The FCI failover takes precedence and the lowest common failover mode is (Manual) for any of its replica pairings.

 

Bob Dorr -  Principal SQL Server Escalation Engineer

How It Works: XEvent Action vs Field data values.

$
0
0

I have seen several traces and questions relating to the output of the XEvent so I thought I would try to explain them a bit in an effort to reduce confusion.

Terms

Field == Event Data
Action == Action Data - The action data is gathered from the current session/request state. 


Let's look at the page_split event as that is one of the most relevant events I have seen questions on.

  • The page_split event contains fields  (page_id and file_id) and there is an action database_id.

Taking a simple insert into a table in pubs you might see the following.

insert into authors ….

action:database_id = 6
page_id = 1045
file_id = 1

This indicates the location of the page split in the pubs database.


Now take the same insert but the sessions current context is the master database (dbid=1)

insert into pubs..authors ….

action:database_id = 1
page_id = 1045
file_id = 1

The action data is correct based on its design to show the current state of the session.   The session is in database_id = 1 (master) but the split really occurs in database_id = 6.  What is missing from this example is a field in the page_split event so you can see the context of the page split and the session.  The split context would be in an event field:database_id and the session context in the action:database_id.


The user is logged into the pubs database (dbid = 6) and executing a query that involves tempdb.

Here is an sample showing the table spool with tempdb involvement while doing a select.  (Note most don't expect possible page split events on a select but you must consider spools, spills and sort activity.)

StmtText

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

  |--Compute Scalar(DEFINE:([Expr1014]=[Expr1013], [Expr1036]=[Expr1035], [Expr1058]=[Expr1057], [Expr1080]=[Expr1079], [Expr1102]=[Expr1101], [Expr1124]=[Expr1123]))

       |--Sort(ORDER BY:([Expr1013] ASC))

            |--Nested Loops(Inner Join, OUTER REFERENCES:(FN_XE_READ_FILE_TARGET_FILE.[event_data]))

                 |--Nested Loops(Inner Join, OUTER REFERENCES:(FN_XE_READ_FILE_TARGET_FILE.[event_data]))

                 |    |--Nested Loops(Inner Join, OUTER REFERENCES:(FN_XE_READ_FILE_TARGET_FILE.[event_data]))

                 |    |    |--Nested Loops(Inner Join, OUTER REFERENCES:(FN_XE_READ_FILE_TARGET_FILE.[event_data]))

                 |    |    |    |--Nested Loops(Inner Join, OUTER REFERENCES:(FN_XE_READ_FILE_TARGET_FILE.[event_data]))

                 |    |    |    |    |--Nested Loops(Inner Join, OUTER REFERENCES:(FN_XE_READ_FILE_TARGET_FILE.[event_data]))

                 |    |    |    |    |    |--Sort(ORDER BY:(FN_XE_READ_FILE_TARGET_FILE.[event_data] ASC))

                 |    |    |    |    |    |    |--Table-valued function

                 |    |    |    |    |    |--Table Spool

The page_split in this case report the action:database_id = 6.


The existing_connection event illustrates the field and action context as well.  The existing connection is output by the session enabling the session.  (ALTER SESSION … START).   So the action data associated with existing connection events would be that of the session starting the trace and not that of the session associated with the existing connection event.

 Bob Dorr - Principal SQL Server Escalation Engineer

How It Works: Failover Cluster/Availability Group XEL Logging Frequency

$
0
0

I had a great question from my post on sp_server_diagnostics (http://blogs.msdn.com/b/psssql/archive/2012/03/08/sql-server-2012-true-black-box-recorder.aspx).  As 'luck' would have it, Bob Ward, was working on some training and had a similar question.  As I investigated I uncovered a few details that can help us all.

Connection Scope

As soon as the Failover Cluster Instance (FCI) or Availability Group is signaled by the cluster manager to come 'ONLINE', a connection is established to SQL Server.  The connection is persistent, just reading the next results set, one after the other, from the sp_server_diagnostics output stream.

Result Set Interval Response

The sp_server_diagnostics interval parameter is controlled by the HealthCheckTimeout value.

image

Formula:  Interval = HealthCheckTimeout / 3;                           //  1/3 of the Health check timeout for the result set response interval

XEL Logging Interval

Here is what confused me the first time I looked at the .\LOG*SQLDIAG*.xel trace for the component_health_result events.

If I establish a performance monitor trace, capturing every 10 seconds, and found a gap of 100 seconds I would troubleshoot the issue as an overall system, responsiveness problem.   This is not the case for gaps in this XEL data.

The messages don't always show up at the 1/3 interval boundaries.   Take the example shown above, the results are returning on 10 second intervals.  Failure to return the result set, on the intervals, could trigger the Server Hang failover level and you may see additional INFO messages logged.

The logic is such that it tries to conserve space in the XEL file.  For example, if the state is CLEAN and the last state was CLEAN the logging may choose to skip output of the duplicate event data.   In the case of our example, the specific code I am studying today could skip the output ~10 times (100 sec) before recording the CLEAN state in the XEL file again.  All it means is the state didn't change, it was CLEAN the entire time.   If any state changes the events are always output.

Don't make the assumption that a gap in the events directly indicates a level of system instability.

Changing Interval

When you change the properties the resource dll (hadrres.dll) establishes a second connection and starts executing sp_server_diagnostics as the new interval.   Once the connection has been successfully established and the query started the previous connection is disconnected.  This insures we never have a monitoring gap.  

If the second connection can't be established the original connection continues monitoring with the prior settings.  This makes sure your server is always being monitored for failover conditions.

Bob Dorr - Principal SQL Server Escalation Engineer

 

 

SQL Server 2012: RML, XEvent Viewer and Distributed Replay

$
0
0


Bob Ward, Keith Elmore and I establish goals for every release of SQL Server.   A primary goal is always to make supporting the SQL Server easier.  Not just in the sense of Microsoft support, although it does play a factor, but more so for you (the customer).

During SQL Server 2012 planning we spent a significant amount of time looking at the features of SQL Nexus, PSSDiag, Performance Dashboard and the RML utilities.  In fact, at one time Keith and I were doing presentations to the development leads on what it would take to productize the RML utilities.   The net of all this work was a series of work items, tasks, scenarios and requirements resulting in updates and enhancements to many areas of the product. 

I am going to focus on XEvent Display for this post. However, if you read the SQL Server Books Online, Distributed Replay topic (http://msdn.microsoft.com/en-us/library/ff878183.aspx) you will see many of the RML capabilities have been extended to this part of the product.

Each version of SQL Server extends the on going supporatability, managability and troubleshooting capabilities.  Over the years SQL Server has added DMVs such as sys.dm_exec_query_stats making it easy to walk up to any SQL Server and get information about the query patterns and performance statistics.    This data has been exposed in a set of  ‘SQL Server Performance Dashboard Reports’.   Note, the SQL Server 2012 Performance Dashboard reports have been updated for SQL Server 2012 by Keith to take advantage of new DMV information.

XEvent Display  (Faster Than A Speeding Bullet?  Judge the Process For Yourself)

XEvent has been significantly extended in SQL Server 2012. The XEvent capabilities have been adopted outside the database engine and more than 400 events were add to the product.   We wanted you to be able to take full advantage of these new capabilities.   This included providing a UI, similar to the SQL Profiler for the .TRC capabilities, for XEvent.  

Designing a UI for XEvent was not trivial.  The advanced capabilities of XEvent make display more than a simple grid.   Some of the ideas are still, to-be-implemented, but the foundation provides some rich capabilities for all of us.

As I mentioned utilities such as RML played key roles in defining the scenarios and requirements for SQL Server 2012.   For as long as RML has been available there has always been a way to determine the top ## queries by CPU, Reads, Write, … in order to determine what is impacting the key resource.  As I mentioned you can see the direct correlation to the early RML capabilities for top ## in places such as sys.dm_exec_query_stats.  You can see the footprints of this in the XEvent display capabilities as well.

Some of the most costly support scenarios are Blocking, Deadlocking, and Query Performance.  For this post I am going to show you how the XEvent Display implemented RML top ## scenarios, right in the shipping product.

The common process for using RML or working with support to troubleshoot a performance problem is to

  • Capture a trace (.TRC) of the statements.   This is often down with a PSSDiag package when you work with support.
  • Run the captured information through ReadTrace (SQLNexus wrapper perhaps)
  • Review the reports provided by RML in Reporter or as directly exposed in SQLNexus

It is well known that .TRC format is 10+ years old and becoming outdated.  The XEvent capabilities are designed to avoid impacting performance while capturing, providing rich data points, enhanced event predicates and many features that the .TRC format does not handle as well.

Instead of using the .TRC replay capture use the XEvent Query Detail Tracing. 

TRC Capture XEVENT Capture
image image

I always get the look ‘Big Deal’, so I use a different template what does that gain me?   For starters XEvent is many times faster with a tiny performance impact compared to its .TRC predecessor.  It also contains some key event columns that Keith, Bob and I worked hard to make sure was part of the statement level events.  (QUERY HASH)

image

One of the problems .TRC has is that there is no way, built into the product, to group the same query into a bucket.   If I issued select * from authors where au_lname = ‘DORR’ and select * from authors where au_lname = ‘ELMORE’ there was no way to correlate them without using RML to parse and normalize the query text into a hash id.  SQL Server already has the query_hash, query_plan_hash and similar data points that can provide the grouping capabilities.  Capturing the statement events in XEvent allows you to capture the hash ids used by the SQL Server.

Now allow me to show you the power of the new UI capabilities so you can see the ‘Big Deal!’

I ran the XEL capture and added the query_hash and statement columns to the view in SSMS.  I specifically changed the where clause of my query to show that the query_hash is the same for varied query text.

image

Now I want to see the TOP N capabilities in action.  

  • Add the ‘Duration’ column to the view.
  • Select ‘Grouping’ and add the query_hash as the grouping column.

image

Select Aggregation… and SUM by Duration.

image

Right mouse on the duration column and select to sort by the aggregation in descending order.

image

You just achieved TOP N statements (same query_hash) sorted by sum of duration, descending order.  You can just as easily select to see the AVG, or by CPU, by Reads, … another resource required to troubleshoot the issue.

Notice that my queries grouped together even with the varied value in the where clause.

image

You have the TOP N views from RML available in the XEvent Display right in SSMS. 

However, that is a bit of work to setup the view each time you open a trace.  Another feature we were able to facilitate is the ability to save the display settings and apply them.   You can create your favorite display settings and same them (usually 500 < byte file).  Then open the display settings and it will apply the columns, sorts, grouping, … and other aspects to the XEL data, returning you to the customized view of your data that you prefer.

image

Furthermore, the XEL data can be exported to a TABLE, filtered XEL file or CSV and used in applications such as Microsoft Excel.  Folks like Keith, Bob and I are starting to use PowerPivot and PowerView to chart and analyze XEL data as well.

Support Advantage ~= Your Advantage

As we all gain experience with XEL captures the goal it to remove the FTP data exchange with Microsoft support.  Today you capture the PSSDiag or .TRC, upload large .TRC files to Microsoft, etc…   Instead of doing this we can (or you can use with your peers as well) provide you with the customized XEL capture definition and XEL Display View (couple of small TXT files quickly over e-mail).  Allow you to capture the data and apply the view without needing to copy the captured data around.

Now the discussion moves directly to the issue at hand instead of lengthy FTP or other data exchanges.  

Try it out, I think you will find additional features in the XEL Display UI that provide many of the features you are used to with SQL Performance Dashboard, SQLNexus and RML.

Bob Dorr - Principal SQL Server Escalation Engineer

How It Works: HealthCheckTimeout Interval Activities

$
0
0

As I wrote my recent blog posts and did more research I found that is would be helpful to highlight the HealthCheckTimeout behavior in more detail.

Always On FCI (Failover Cluster Instance) vs Non-FCI Installations Documentation

The first thing that I need to point out is the subtle wording difference in Books Online and other forms of documentation that is easy to over look.

When the documentation references Always On FCI this is a clustered instance of SQL Server and not an AG on a standalone instance of SQL Server.

This is important because things like the HealthCheckTimeout defaults are documented differently.   In the case of a FCI instance the default is 60 seconds but for a non-FCI AG the default is 30 seconds.   This is outlined in SQL Server Books Online but until I reminded myself to carefully pay attention to the FCI reference it is easy to overlook.

Only 1 sp_server_diagnostics Execution Per Instance

The resource dll (hadrres.dll) hosts the SQL Server failover detection logic for the Availability Group (AG) resource.  The logic is designed to only execute a single instance sp_server_diagnostics no matter how many AGs are in use by the SQL Server Instance.  This is where the 'How It Works' comes into play.

imageAs the AGs are brought online (or the HealthCheckTimeout is adjusted for the AG) the resource dll calculates the smallest, heath check timeout value.

Smallest Timeout = max(5, min(All Active AGs for Same SQL Server Instance)/3  )

The logic looks at ALL the AG HealthCheckTimeout values for the same SQL Server Instance.  It takes the smallest of these values and divides the value by 3, making sure the interval is no less than 5 seconds.

Using the calculated interval the hadrres, health worker establishes a persistent connection to SQL Server and invokes sp_server_diagnostics <<interval>>.  As the results sets are returned the health worker processes the results and broadcasts the updates to the active AG resource health monitors.

This allows a single result set stream to work at the smallest, HealthCheckTimeout interval and each AG (FIsHealthy) can honor the HeathCheckTimeout established for that AG.

2 Instances of sp_server_diagnostics for Same SQL Server Instance

I just documented that a single copy of sp_server_diagnostics is used for all AGs on the same SQL Server Instance.   Then why would I add this section?

When the HealthCheckTimeout is changed (new AG brought online or HealthCheckTimeout property updated) the resource dll's logic will establish a second connection and execute sp_server_diagnostics when a smaller timeout needs to be established.   As soon as the new connection is properly receiving results the old connection is closed.  It will be a small window to handle the interval change.

FCI Takes Precedence

Please keep in mind that if the instance of SQL Server is clustered (FCI) the FCI behavior takes precedence over the AG behaviors.  The AGs will not automatically failover when associated with a FCI.

Bob Dorr - Principal SQL Server Escalation Engineer

Viewing all 339 articles
Browse latest View live




Latest Images