AlwaysOn ERROR: Log backup for database "MyDB" on secondary replica created backup files successfully but could not ensure that a backup point has been committed on the primary.

December 5, 2012, 3:08 pm

≫ Next: SQL Server 2012 Setup just got smarter…

≪ Previous: FileNotFoundException with Microsoft.AnalysisServices.Xmla

This error surprised me when it showed up in the error log so I decided to dig into it a bit more.

“Log backup for database "MyDB" on secondary replica created backup files successfully but could not ensure that a backup point has been committed on the primary. This is an informational message only.

Preserve this log backup along with the other log backups of this database.”

The message appears on the secondary replica, where the backup was taken, and indicates one of two possible conditions. To understand the conditions it helps to understand the state of the backup operation.

The error occurs after the point where the backup was successfully streamed to the backup media. The first concern I had was “is my backup valid” and the answer is YES! The backup has been written to the backup media properly.

Now the secondary sends a message to the primary to update the backup position in the database. If this message fails the error is logged.

1. The secondary sends the completed message to the primary. The primary records the backup position but for some reason fails to respond the secondary (I.E. lost network connection.) In this case the next backup will pick up were the current backup completed.

2. The secondary sends the completed message to the primary. The primary never receives the message or fails in some way before it can record the current backup position. (I.E. network, primary shut down, etc…). In this case the next backup will acquire the same information as the last log backup and any additional log records generated.

As you can see the message may be a bit alarming but it is harmless to your backup strategy other than the fact that you might get extra log records in a backup but that can be properly handled during restore. NO DATA LOSS!

Bob Dorr - Principal SQL Server Escalation Engineer

↧

SQL Server 2012 Setup just got smarter…

December 6, 2012, 6:03 pm

≫ Next: How It Works: SQL Server 32 bit PAE/AWE on (SQL 2005, 2008, and 2008 R2) – Not Using As Much RAM As Expected!

≪ Previous: AlwaysOn ERROR: Log backup for database "MyDB" on secondary replica created backup files successfully but could not ensure that a backup point has been committed on the primary.

For many years, I’ve been pushing for a way for our SQL Server setup to “fix itself” or “get smarter”. What I mean is that I’ve always wanted a way for setup to detect if any fixes are available online to apply as part of running the original installation. Consider the scenario where you are going to install SQL Server 2012 now that it has been released for several months. Wouldn’t it be nice to take advantage of avoiding any problems that might have been found with setup since its release? Why wait and have to hit a problem someone else has encountered but has been fixed. Furthermore, if setup has been enhanced in any way to prevent problems (think setup consistency checks), why not get to use these as well. As I started to promote this concept within Microsoft, it took on the moniker of “Smart Setup”. Many people both in the product team and CSS then contributed over the last year to make this reality happen.

The concept was first launched with SQL Server 2008 with a feature called Patchable Setup. In addition, this capability included a method to “slipstream setup", but the process was very manual.

Now comes along a new feature for SQL Server 2012 called Product Update which can take advantage of the Microsoft Update Service.

Product Updates from Microsoft Update

I thought it would be interesting if I run through the experience in this blog post and make a few comments along the way. First, I’ll show the experience when you use the defaults of just “running setup” which is to look for updates from Microsoft Update. After that, I’ll show you how to point setup to update packages you have downloaded.

I’ll start the setup experience from the License Terms screen to show you what screen comes up right before the detection of updates. In this scenario, I just ran setup.exe. I didn’t pick any special command line parameters.

If there are updates found online, you will now get a screen like this one. Since SP1 just shipped, setup has detected that SP1 Setup Updates are available to apply as part of the initial product installation.

Notice the “Name” is called “SQL Server 2012 Setup Update”. This means that setup has automatically detected an update for SP1 but the update is only for setup fixes not for the entire SP1 package. Observe the size is only 14Mb. I’ll discuss later in this post how to apply the entire SP1 update. The main reason we chose to only post setup changes for SP1 when automatically detecting updates is the sheer size and time it would take to auto update setup with the full SP1 package.

if you click on the article listed in the More Information section, you will see a SP1 “fixlist” article.

Unfortunately we don’t have our SP1 setup fixes listed in this article at the time I’m writing this blog post but are working on an update.

If you select “Learn more about SQL Server product updates”, you will be sent to the documentation that references the Product Update feature.

Once you hit the Next button, you will be shown a screen for the installation of the setup files. This is the same screen you would jump to if no product updates were detected.

Once you complete this process and go through other setup screens, you can see that Product Updates are being applied as part of setup on the “Ready to Install” screen.

Once setup is complete, you can see that you have installed SQL Serve 2012 RTM + SP1 Setup updates by looking at the version of programs installed on your server.

While the overall “version” of SQL Server 2012 SP1 is 11.0.3000.0 we actual bump up the “minor” version field for the Setup Program.

Providing a location for Product Updates

What if though you wanted to run setup and apply product updates for the entire SP1 package instead of just the setup updates? And perhaps you want to apply SQL Server 2012 SP1 CU1 as well while you are at it. SQL Server 2012 setup comes with two new command line parameters /UpdateEnabled and /UpdateSource.. /UpdateEnabled takes a value of TRUE (default) or FALSE. You could use this parameter on its own to disable SQL Server Setup from automatically checking for updates online. /UpdateSource allows you to tell the setup program where to look for updates vs looking for them on Microsoft Update. (Note the value of “MU” is the default and means to update from Microsoft Update). The source can be any valid file path including local directories or UNC paths. Note the source is a directory not the package file names. There can be other files in this directory. Setup will look for packages it knows can be applied for SQL Server.

To demonstrate this, I download the SQL Server 2012 SP1 update package to a downloads folder on the server where I am installing SQL Server. I also copied down the SQL Server 2012 SP1 CU1 package as well to this same folder (NOTE: The CU package comes as _zip.exe file. You must run this .EXE to extract out the true .EXE package first).

Here is a screenshot of the downloads folder:

Now I ran setup..exe from the install media (in this case it was on a network share for me) with the following syntax (this was from PowerShell):

.\Setup.exe /Action=Install /UpdateSource="c:\Users\Administrator\Downloads"

NOTE: A few syntax comments:

Quotes around the path are optional (at least this was the case in PowerShell)
You don’t have to specify /UpdateEnabled=TRUE when using /UpdateSource
The parameter is case insensitive. So for example, you can use /UPDATESOURCE
Specifying /Action=Install makes things go faster as you skip the landing page screens

When you now get to the Product Update screen it will look something like this:

When you hit Next you get a similar screen as before except the Step to download setup files is Skipped

Now you will go through the same installation screens as before. When you get to the “Ready to Install” screen you will see a difference in the Product Update details:

The Product Update feature allows you to “slipstream” install a new setup or new instance along with updates such as service packs and cumulative updates. It also allows us to post setup fixes on Microsoft Update and allow you to apply them if you just want to take advantage of any setup fixes we discover as we progress through the lifecycle of a product.

We will start to look into what setup fixes and rules we can put into future cumulative updates so that as we produce a new cumulative update, the setup fixes can be placed on Microsoft Update to help keep setup "updated” all the time with the latest set of fixes.

I look forward to see how this technology can improve the user experience of setup and reduce the number of problems customers can encounter when installing the product.

I want to personally thank Shau Phang from the SQL Product team who is a PM working on this project and who patiently answer my questions along the way in building out this blog post.

Bob Ward
Microsoft

↧

How It Works: SQL Server 32 bit PAE/AWE on (SQL 2005, 2008, and 2008 R2) – Not Using As Much RAM As Expected!

December 11, 2012, 1:07 pm

≫ Next: How It Works: SQL Server (NUMA Local, Foreign and Away Memory Blocks)

≪ Previous: SQL Server 2012 Setup just got smarter…

This issue was puzzling until we stepped through the code and studied it in some detail. The report was “SQL Server won’t use the physical memory I expect it to use.”

Scenario: SQL Server 2008 R2, 32GB RAM - SQL only using ~22GB of total memory and won’t use anymore.

Let me try to explain what we found.

	This is a high level diagram of the SQL Server process when started on a 32GB system. The BUF array is a set of structures that point to associated 8K memory blocks. The blocks stored above the 2GB, virtual memory range, are the AWE buffers; which can only be data pages. The blocks stored in the mapping window can be data pages or stolen memory maintained by SQL Server. When SQL Server needs to access a data page stored in the AWE memory range it performs a mapping operation. Think of this like a page file swap. A block from the mapping window is exchanged with a block in the AWE range allowing SQL Server to access the page.
Formula Basics and Mapping Window	The default behavior of SQL Server is to maintain a large mapping window, similar to the SQL Server 2000 design. The formula is strait-forward enough. Take the physical memory you want to address, divide it by 8K (SQL Server data page size) to determine how many BUF structures are required to track those pages. Now take the virtual address range (2GB) and subtract overhead for thread stacks, (-g default 256MB) parameter, and images to determine the virtual address space remaining. Take the remaining virtual address space and subtract the desired size of the BUF array to determine the maximum mapping window size. Note: There are checks in the code to make sure reasonable boundaries are maintained during the calculations. The mapping window is important because too small of mapping window causes high levels of memory swaps and can reduce performance. However, setting aside to much physical memory can result in reduced performance as well. This is a delicate balancing act. Monitoring the dbcc memorystatus (Visible) value can help you understand the mapping window size. Combining this with several performance counters, including the AWE maps and unmaps/sec counters, while comparing overall application performance is a way to tune your system.
	SQL 2000 was exactly as I have described the formula allowing all the memory to be addressed in the AWE range. Calculate the tracking buffers needed then size of the BUF array and mapping window.
	SQL 2005 and newer versions of SQL Server had to make a behavior choice. The size of the BUF structure increased in SQL 2005 and newer versions. This means if you applied the same formula the mapping window would be smaller. (I.E. The same formula needs to use a larger BUF size so the mapping window can’t be as large.) Let me give you an example. Assume you need to track 100, 8K blocks and the overhead for the SQL 2000 BUF is 40 bytes and SQL 2005 and later it is 64 bytes (not actual sizes, example only.) 100 * 40 = 4000 bytes 100 * 64 = 6400 bytes To track the same number of 8K buffers the BUF array has to increase by 2400 bytes and the mapping window has to shrink by 2400 bytes. The decision was made to DEFAULT to the SAME MAPPING WINDOW size as SQL 2000 and NOT increase the overall BUF array size. This means SQL Server prefers the larger mapping window vs addressing all RAM when on a 32 bit SKU using AWE memory. The example looks like the following. 100 * 40 = 4000 bytes (Same SQL 2000 BUF array size) 400 / 64 = 62 BUFs instead of 100 BUFs. The BUF array is no longer able to track 38 of the 100 blocks. Thus, you may experience the situation where SQL Server Target memory appears to be deflated on your system and won’t increase as you expect it to.
	SQL Server provides a startup only, trace flag (-T836) that indicates you wish to maximize the use of physical RAM, at the expense of a smaller mapping window. Note: Again, test carefully because reducing the mapping window may change your performance. The trace flag tells SQL Server to calculate the BUF size based on the SQL 2005 and newer version(s) BUF overhead. This allows the additional RAM to be tracked and used at the expense of the reduced mapping window.

As you can see it all boils down to the maintenance of the mapping window. With careful testing and attention to your application performance you can determine the best option to maximize for your deployment(s).

Bob Dorr - Principal SQL Server Escalation Engineer
Jack Li – SQL Server Escalation Engineer

↧

How It Works: SQL Server (NUMA Local, Foreign and Away Memory Blocks)

December 13, 2012, 7:03 pm

≫ Next: Azure SQL Database Import/Export Service - Change always brings both challenges and benefits

≪ Previous: How It Works: SQL Server 32 bit PAE/AWE on (SQL 2005, 2008, and 2008 R2) – Not Using As Much RAM As Expected!

The NODE an operating system page, physically belongs to can be acquired using the QueryVirtualMemoryEx Windows API. SQL Server uses this API to track locality of memory allocations.

This blog is a very high level view of SQL Server behavior but I think it provides a sufficient picture as to what is happening.

This tracking is important to performance because SQL Server makes reasonable attempts to use node, local memory whenever possible. Access to memory on a remote node (remote memory/foreign memory) takes longer which can lead to, small, unwanted delays in query processing.

When a block of memory is allocated SQL Server looks at each operating system page and sorts it according the physical memory node assignments on the system. There are a couple of states that the memory manager uses and in doing so the behavior varies.

Type	Description
Local	The memory is physically present on the same NUMA node as the SQL Server worker.
Away	The memory is known to belong to a remote node and the memory manager is still in the GROWTH (often initial ramp-up phase.) The memory is held on the away list until SQL Server requires its use.
Foreign	The memory is known to belong to a remote node and SQL Server has transitioned from the GROWTH phase because max server memory has been achieved. The memory block is being used remotely.

There is a subtle distinction between Away and Foreign blocks related to ‘is SQL Server memory still growing’ or has ‘SQL Server memory reached the target memory level.’

The difference directly drives how and when SQL Server consumes the memory. When SQL Server has not reached the max server memory the away buffers can be set aside. Let me try to explain better.

Allocate Memory
If(Memory is remote to the node)
    Place on away list
    Loop back and try to allocate another block
else
   Use the local memory block

The reason to place blocks on an away list and not use them right away helps prevent a bad cycle with the operating system. If SQL Server released the block it may go onto the operating system free list. The very next call to allocate memory could return the same block to SQL Server, preventing forward process towards local memory allocation.

Once SQL Server reaches the target memory level for the instance the memory manager transitions. This transition signals that away buffers can be used by their assigned, physical nodes. The memory continues to be balanced across all memory nodes as appropriate. Only at this point are any pages, that can’t be directly returned to their local nodes, considered Foreign.

The foreign is memory known to belong to another node and being activity used outside its home node.

The following is a snippet of data from dbcc memorystatus showing the behavior.

Type	Description
Away Committed	The amount of memory allocated from ‘this’ physical memory node that is currently assigned to a remote memory node.
Taken Away Committed	The amount of memory that ‘this’ node has set aside because it knows the memory block belongs to another node.

Knowing the Away Committed and Taken Away Committed values you can look across all memory nodes and understand the current remote vs local memory allocation pattern.

Once the memory manager transitions from GROWTH and the distribution of away blocks occur the output changes to show any foreign memory usage. In this example only 32kb of memory remains foreign to node 1 after all balancing and memory block assignment is complete.

My reaction now is ‘BIG DEAL WHY DO I CARE?’

I pointed this out for a couple of reasons.

First, the away buffer counts are not part of the performance counters, only foreign memory counters. To view the away buffer counts you need to capture dbcc memorystatus details. You could also look at sys.dm_os_memory_nodes or other locations to calculate the difference in free list size, committed and target on the node to determine possible foreign or away block sizes and counts.

Second, during the GROWTH phase, the free and target values for a node with a large ‘Taken Away Committed’ count will be much larger than 1/nth of your memory nodes. A great indicator that this node has a larger than expected, remote memory offering.

Third, a single query with larger than expected RAM usage may appear. In my lab testing (256GB RAM system) a query that only needed 30GB of RAM would cause my SQL Server to use ~100GB of total memory. ~70GB of the memory was on my away lists and not actively used by SQL Server to support the query. This is expected because I told SQL Server it could use 225GB of memory (max server memory.) Since SQL Server has not reached that limit putting buffers on the away list is not a problem. As soon as SQL Server transitions from the GROWTH phase these buffers are readily available, mostly on their home, physical node.

Bob Dorr - Principal SQL Server Escalation Engineer

↧

Azure SQL Database Import/Export Service - Change always brings both challenges and benefits

December 14, 2012, 9:47 am

≫ Next: SQLCLR and sp_OA* procedures are not compatible

≪ Previous: How It Works: SQL Server (NUMA Local, Foreign and Away Memory Blocks)

We recently upgraded the Import/Export Service to v3 of the DAC Framework (http://technet.microsoft.com/en-us/library/ee210546.aspx). This aligns the Import/Export service with what shipped with SQL Server 2012. Like all upgrades, this has brought both benefits and challenges. While overall we are seeing a significantly reduced amount of failures (both imports and exports), we are seeing some specific scenarios where we are having some trouble. To that end, I wanted to share some increased detail around a specific failure related to to the use of three-part naming, plus an alternative mechanism for doing imports and exports if you have a problem with the service.

The three-part naming problems are arising because Azure SQL Database doesn’t allow the use of external references. Fundamentally, this is because even though Azure SQL databases are grouped underneath a logical server name, there is no physical relationship between them. Unfortunately, the older version of the Import/Export code didn’t fully protect against some of these scenarios, so it wasn’t technically possible to round-trip and BACPAC file through the service in these scenarios. The primary place where we are seeing this crop up and cause trouble is when someone has used a valid three-part reference to the database ([myprimarydatabase].[dbo].[name]). Technically, this is indeed valid since you are inside [myprimarydatabase]. However, if you were to export some TSQL with that reference it wouldn’t be valid if you tried to import it into [mysecondarydatabase]. Therefore, we block this export in v3. In order to successfully complete the export, you will need to modify your TSQL to just reference [dbo].[name].

One of the Import/Export Program Managers, Adam Mahood, has recently posted a full explanation of this scenario and walks through how to use SQL Server Data Tools to help ferret out the location of these three-part references. You can see his full post at http://blogs.msdn.com/b/ssdt/archive/2012/12/13/windows-azure-import-export-service-and-external-references.aspx.

Now, that I have covered the challenge, I want to share one of the key improvements. Moving to v3 brings the benefit of being able to fully leverage the command-line interface for the DAC Framework – sqlpackage.exe. As you can see from http://msdn.microsoft.com/en-us/library/hh550080(v=VS.103).aspx, sqlpackage.exe covers the full range of operations associated with moving databases between servers – both on-premises and cloud. Much like the old sledge-o-matic (no pun intended, but if you know the reference, you are automatically dating yourself Smile ), it does a full range of things. You can do full imports, full exports, schema imports, schema exports, incremental deployments, and more!

I will try to come back after my Christmas vacation to do a broader post, but I wanted to cover my current favorite today – the ability to do a full import or export from SQL Azure without actually using the Import/Export service. (Hint, hint – I first discovered this capability during the recent Import/Export service issue this past weekend, so you can certainly see that one its primary uses if you use Azure SQL Database is as a contingency backup mechanism).

If you don’t already have SQL Server Data Tools installed, you can install them from http://msdn.microsoft.com/en-us/data/hh297027. Once you have the binaries installed, you can use the command-line below to do an export:

"C:\Program Files (x86)\Microsoft SQL Server\110\DAC\bin\sqlpackage.exe" /a:Export /ssn:yourserver.database.windows.net /sdn:"your database to export" /su:"yourdbuser" /sp:"your password" /tf:"bacpac file to create on local disk"

Here’s a screenshot of what happens with the above command:

In addition, I can import (technically create in this case since I am not doing an incremental deploy) a database in a similar fashion):

"C:\Program Files (x86)\Microsoft SQL Server\110\DAC\bin\sqlpackage.exe" /a:Import /tdn:"your database to create" /tp:"your password here" /tsn:"yourserver.database.windows.net" /tu:"yourdbuser" /sf:"bacpac on local disk"

Here’s the output:

Voila! A nice easy roundtrip!

As I said before, I will try to come back over the holidays to cover some of the incremental deployments, but in the meantime hopefully this gives you some sense of the power of sqlpackage.exe.

↧

SQLCLR and sp_OA* procedures are not compatible

December 20, 2012, 8:52 am

≫ Next: How It Works: CMemThread and Debugging Them

≪ Previous: Azure SQL Database Import/Export Service - Change always brings both challenges and benefits

We ran into an issue today that is a bug you may need to be aware of because of its behavior.

When a SQLCLR procedure calls back into the SQL Server (in proc provider) and executes sp_OA*, during the callback activity, it triggers a bug (currently filed and being evaluated) that results in heap corruption and the termination of the SQL Server process. (The Microsoft security policy is clear that any detected heap corruption must result in process termination to reduce a possible injection attack vector(s).)

SQLCLR and sp_OA* can be used separately it is the callback from SQLCLR that triggers the bug behavior. When the loopback occurs the activity can be assigned to a second worker thread. The second worker and the parent worker are not handling the memory allocation properly, leading to the heap corruption.

SQL Server will NOT capture a mini-dump because of the process termination activity. In order to capture detailed stack information you would need to use gflags or an external debugger.

Bob Dorr - Principal SQL Server Escalation Engineer

↧

How It Works: CMemThread and Debugging Them

December 20, 2012, 12:34 pm

≫ Next: Uneven query executions with parallelism

≪ Previous: SQLCLR and sp_OA* procedures are not compatible

The wait type of CMemThread shows up in outputs such as sys.dm_exec_requests. This post is intended to explain what a CMemThread is and what you might be able to do to avoid the waits. The easiest way to describe a CMemThread is to talk about a standard Heap, HeapCreate and the options (with or without HEAP_NO_SERIALIZE).

Serialization is the a process of making sure only one thread of execution can execute a specific segment of code. The technique is most often talked about when talking about Windows Synchronization objects, such as, Mutexes and CriticalSections.

I think of it like the ticket dispenser. You get a ticket and wait your turn to be served. This just like a synchronization object, let’s look at an example.

EnterCriticalSection // Wait for your turn

dwVal++ // Do something that no other thread is allowed to do unless they have the ticket

LeaveCriticalSeciton // Allow the next ticket owner to execute the code

While the example is simplistic it quickly applies to a Heap. To allocate memory from a heap you would use HeapAlloc. The heap maintains various lists that can only be adjusted by one thread at a time or it would corrupt the lists. Let’s take a closer look at a high level heap design.

The heap can be made up of multiple segments (different ranges of memory) that are linked together and each segment can have used and free blocks of memory.

When a HeapAlloc takes place the heap will locate a free block to support the allocation, update the free list, update used information and could even allocate a new segment if necessary to create more free memory. The maintenance of the list(s) are important to making sure the heap structures properly remain intact. If multiple threads attempt to modify the heap structures, in parallel, the structures will become damaged and lead to memory corruption. (Scribblers: http://blogs.msdn.com/b/psssql/archive/2012/11/12/how-can-reference-counting-be-a-leading-memory-scribbler-cause.aspx)

When you create a heap with the HEAP_NO_SERIALIZE option your code must make sure you don’t make calls to HeapAlloc, HeapReAlloc, HeapFree, Heap* by more than one thread at a time. This is usually done using something like a CriticalSection around all Heap* invocations.

EnterCriticalSection
HeapAlloc
LeaveCriticalSection

EnterCriticalSection
HeapFree
LeaveCriticalSection

If you allow the Heap to maintain synchronization it will provide an efficient synchronization wrapper on your behalf so you don’t have additional synchronization mechanisms in your code.

CMemObj

SQL Server has a class named CMemObj that can be thought of as acting like a heap for the SQL Server developers. Instead of using HeapCreate the developer is calls CreateMemoryObject (often called a PMO – pointer to memory object) that is backed by the SQL Server memory manager. If you execute a select against sys.dm_os_memory_objects you can see the various memory objects currently in use by the SQL Server. The CMemObj is responsible for handling common activities such as Alloc, Free, ReAlloc, … as you would expect.

Think of the CMemObj as a HEAP_NO_SERIALIZE option for the SQL Server developer. It is not thread safe so the memory object should only be used by a single thread.

CMemThread

The CMemThread is the serialization wrapper around a CMemObj. For example the CMemThread::Alloc looks like the following.

CMemThread::Alloc(…)

{

Enter SOS_Mutex // CMEMTHREAD WAIT TYPE AND ACCUMULATION OCCURS HERE

CMemObj::Alloc(…) // __super::Alloc

Leave SOS_Mutex

}

The developer creates a memory object with the thread safe flag and SQL Server’s CreateMemoryObject will return a pointer to a CMemThread instead of the underlying CMemObj but overriding the necessary methods to provide the thread safe wrapper so the developer can share the memory object among any thread.

When you get a CMEMTHREAD wait you are observing multi-threaded access to the same CMemObj causing a wait while another thread is completing Alloc, Free, …. This is to be expected as long as the wait does not become excessive. When the number of waits and wait time start to become significant it can indicate that you need to release the pressure on the specific memory object.

3 Types

There are 3 types of memory objects (Global, Per Numa Node, Per CPU). For scalability SQL Server will allow a memory object to be segmented so only threads on the same node or cpu have the same underlying CMemObj, reducing thread interactions from other nodes or cpus, thereby increasing performance and scalability.

Many of the SQL Server memory objects are already segmented by node or cpu and provide scalability. Reference the following post for more details: http://blogs.msdn.com/b/psssql/archive/2011/09/01/sql-server-2008-2008-r2-on-newer-machines-with-more-than-8-cpus-presented-per-numa-node-may-need-trace-flag-8048.aspx

bThreadSafe = 0x2,

bPartitionedByCpu = 0x40,

bPartitionedByNode = 0x80, -T8048 upgrade from by Node to by CPU (Can’t upgrade from global to by CPU)

Looking at the creation_options in sys.dm_os_memory_objects you can determine if the memory object is partitioned and if so to what degree, node or cpu. If the object is not partitioned (global) the trace flag has no impact on upgrading the partitioning scheme.

Here is an example that shows the active memory objects that are partitioned by cpu.

select*fromsys.dm_os_memory_objects

where 0x40 = creation_options & 0x40

Will TF 8048 Help Reduce CMEMTHREAD Waits?

Here is a query that you can run on your box when you see high CMEMTHREAD waits.

SELECT

type,pages_in_bytes,

CASE

WHEN (0x20 =creation_options& 0x20)THEN'Global PMO. Cannot be partitioned by CPU/NUMA Node. TF 8048 not applicable.'

WHEN (0x40 =creation_options& 0x40)THEN'Partitioned by CPU.TF 8048 not applicable.'

WHEN (0x80 =creation_options& 0x80)THEN'Partitioned by Node. Use TF 8048 to further partition by CPU'

ELSE'UNKNOWN'

END

fromsys.dm_os_memory_objects

orderbypages_in_bytesdesc

If you see the top consumers being of type 'Partitioned by Node.’, you may use startup, trace flag 8048 to further partition by CPU.

Note: Trace flag 8048 is a startup trace flag.

Removing Hot Memory Object

· If the memory object is NUMA partitioned you may be able to use the trace flag to further partition the object and increase performance.

· If the memory object is global or already partitioned by CPU you need to study and tune the queries impacting the memory object.

Troubleshooting

To troubleshoot this issue, we need to understand the code path that is causing contention on a memory object.

An example of this is the memory object used to track allocations for create table. The stack for which looks like the following:

00 sqlservr!CMemThread<CMemObj>::Alloc

01 sqlservr!operator new

02 sqlservr!HoBtFactory::AllocateHoBt

03 sqlservr!HoBtFactory::GetFreeHoBt

04 sqlservr!HoBtFactory::GetHoBtAccess

05 sqlservr!HoBtAccess::Init

06 sqlservr!HoBtFactory::CreateHoBt

07 sqlservr!SECreateRowset

08 sqlservr!DDLAgent::SECreateRowsets

09 sqlservr!CIndexDDL::CreateRowsets

0a sqlservr!CIndexDDL::CreateEmptyHeap

…

Starting a workload of create table(s) can cause the specific memory object contention as shown in the following stack trace.

00 ntdll!NtSignalAndWaitForSingleObject

01 KERNELBASE!SignalObjectAndWait

02 sqlservr!SOS_Scheduler::Switch

03 sqlservr!SOS_Scheduler::SuspendNonPreemptive

04 sqlservr!SOS_Scheduler::Suspend

05 sqlservr!EventInternal<Spinlock<154,1,0> >::Wait

06 sqlservr!SOS_UnfairMutexPair::LongWait

07 sqlservr!SOS_UnfairMutexPair::AcquirePair

08 sqlservr!CMemThread<CMemObj>::Alloc

09 sqlservr!operator new

0a sqlservr!HoBtFactory::AllocateHoBt

0b sqlservr!HoBtFactory::GetFreeHoBt

0c sqlservr!HoBtFactory::GetHoBtAccess

0d sqlservr!HoBtAccess::Init

0e sqlservr!HoBtFactory::CreateHoBt

0f sqlservr!SECreateRowset

10 sqlservr!DDLAgent::SECreateRowsets

11 sqlservr!CIndexDDL::CreateRowsets

12 sqlservr!CIndexDDL::CreateEmptyHeap

…

The call to sqlservr!SOS_UnfairMutexPair::LongWait, from a memory object, results in the CMEMTHREAD wait. You can use the following query to see wait information related to sessions and requests.

select r.session_id,r.wait_type,r.wait_time,r.wait_resource

fromsys.dm_exec_requests r

joinsys.dm_exec_sessions s

on s.session_id=r.session_id and s.is_user_process=1

session_id wait_type wait_time wait_resource

---------- --------------- ----------- ---------------

54 NULL 0

55 NULL 0

56 CMEMTHREAD 17062

57 CMEMTHREAD 17062

58 CMEMTHREAD 17063

59 CMEMTHREAD 17063

60 CMEMTHREAD 17062

Use Extended Events and collect call stacks for all waits on CMEMTHREAD using an asynchronous bucketizer (or histogram in SQL Server 2012.)

--First get the map_key for CMEMTHREAD wait type from the name-value pairs for all wait types stored in sys.dm_xe_map_values

--NOTE :- These map values are different b/w SQL Server 2008 R2 and 2012

select m.*fromsys.dm_xe_map_values m

joinsys.dm_xe_packages p on m.object_package_guid = p.guid

where p.name ='sqlos'and m.name ='wait_types'

and m.map_value ='CMEMTHREAD'

name object_package_guid map_key map_value

------------------------------------------------------------ ------------------------------------ ----------- ---------------

wait_types BD97CC63-3F38-4922-AA93-607BD12E78B2 186 CMEMTHREAD

--Create an Extended Events session to capture callstacks for CMEMTHREAD waits ( map_key=186 on SQL Server 2008 R2)
--Create an Extended Events session to capture callstacks for CMEMTHREAD waits ( map_key=186 on SQL Server 2008 R2)

IFEXISTS(SELECT*FROMsys.server_event_sessionsWHERE name='XeWaitsOnCMemThread')

DROPEVENTSESSION [XeWaitsOnCMemThread] ONSERVER

CREATEEVENTSESSION [XeWaitsOnCMemThread] ONSERVER

ADDEVENT sqlos.wait_info(

ACTION(package0.callstack,sqlserver.session_id,sqlserver.sql_text)

WHERE (

[wait_type]=(186))--map_key for CMEMTHREAD on SQL Server 2008 R2)

AND [opcode] =(1)

AND [duration]> 5000 -- waits exceed 5 seconds

)

ADDTARGET package0.asynchronous_bucketizer

(SET filtering_event_name=N'sqlos.wait_info',

source_type=1,

source=N'package0.callstack')

WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,

MAX_DISPATCH_LATENCY=5 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)

--Create second Xevent session to generate a mini dump of all threads for the first two wait events catpured for CMEMTHREAD

IFEXISTS(SELECT*FROMsys.server_event_sessionsWHERE name='XeDumpOnCMemThread')

DROPEVENTSESSION [XeDumpOnCMemThread] ONSERVER

CREATEEVENTSESSION [XeDumpOnCMemThread] ONSERVER

ADDEVENT sqlos.wait_info(

ACTION(sqlserver.session_id,sqlserver.sql_text,sqlserver.create_dump_all_threads)

WHERE (

[wait_type]=(186))--map_key for CMEMTHREAD on SQL Server 2008 R2)

AND [opcode] =(1)

AND [duration]> 5000 -- waits exceed 5 seconds

AND package0.counter<=2 --number of times to generate a dump

)

addtarget package0.ring_buffer

WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,

MAX_DISPATCH_LATENCY=5 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)

--Start the sessions

ALTEREVENTSESSION [XeWaitsOnCMemThread] ONSERVERSTATE=START

ALTEREVENTSESSION [XeDumpOnCMemThread] ONSERVERSTATE=START

When you collect data using the histogram target, you can acquire the un-symbolized call stack using the following query.

SELECT

n.value('(@count)[1]','int')AS EventCount,

n.value('(@trunc)[1]','int')AS EventsTrunc,

n.value('(value)[1]','varchar(max)')AS CallStack

FROM

(SELECTCAST(target_data asXML) target_data

FROMsys.dm_xe_sessionsAS s

JOINsys.dm_xe_session_targets t

ON s.address= t.event_session_address

WHERE s.name ='XeWaitsOnCMemThread'

AND t.target_name ='asynchronous_bucketizer')as tab

CROSSAPPLY target_data.nodes('BucketizerTarget/Slot')as q(n)

EventCount EventsTrunc CallStack

--------------------------------------------------------------------------------------

1 0 0x0000000001738BD8

0x0000000000E53F8B

0x0000000000E541C1

0x0000000000E529B6

0x0000000000FBF22A

0x0000000000F763CB

0x0000000000E578C4

0x0000000000E56DFA

0x0000000000F86416

…

Symbolize the stack addresses to function/method names using the ln command (Windows Debugger) and public symbols against the dump that was captured, as shown below.

Note: The mini-dump capture is important because it contains the image versions, locations and sizes at the time the XEL capture took place.

0:049> .sympath SRV*c:\symcache_pub*http://msdl.microsoft.com/download/symbols

Symbol search path is: SRV*c:\symcache_pub*http://msdl.microsoft.com/download/symbols

Expanded Symbol search path is: srv*c:\symcache_pub*http://msdl.microsoft.com/download/symbols

0:049> .reload /f sqlservr.exe

0:049> ln 0x0000000001738BD8

(00000000`00e5462c) sqlservr!XeSosPkg::wait_info::Publish+0xe2 | (00000000`00e5471c) sqlservr!SETaskSuspendingNotification

0:049> ln 0x0000000001738BD8;ln 0x0000000000E53F8B;ln 0x0000000000E541C1;ln 0x0000000000E529B6;;ln 0x0000000000FBF22A;ln 0x0000000000F763CB;ln 0x0000000000E578C4;ln 0x0000000000E56DFA;ln 0x0000000000F86416;ln 0x0000000000F7D922;ln 0x0000000000F87943;ln 0x0000000000F0083B;ln 0x0000000000F05D00

(00000000`00e5462c) sqlservr!XeSosPkg::wait_info::Publish+0xe2

(00000000`00e53d58) sqlservr!SOS_Scheduler::UpdateWaitTimeStats+0x286

(00000000`00e54174) sqlservr!SOS_Task::PostWait+0x4d

(00000000`00e52890) sqlservr!EventInternal<Spinlock<154,1,0> >::Wait+0x1b2

(00000000`00f7628c) sqlservr!SOS_UnfairMutexPair::LongWait+0x104

(00000000`00e577f4) sqlservr!SOS_UnfairMutexPair::AcquirePair+0x46

(00000000`00e57858) sqlservr!CMemThread<CMemObj>::Alloc+0x6c

(00000000`00e56ddc) sqlservr!operator new+0x1e

(00000000`00f7d930) sqlservr!HoBtFactory::AllocateHoBt+0xba

(00000000`00ef4a38) sqlservr!HoBtFactory::GetFreeHoBt+0x12a

…

Once you have a symbolized stack you have a better understanding of the memory, contention point as well as the command(s) that are contributing to the contention. Using the trace flag or changing the query can remove the contention and improve SQL Server performance.

Co-Author: Special Thanks and XEvent assistance Provided by: Rohit Nayak
Bob Dorr - Principal SQL Server Escalation Engineer

↧

Uneven query executions with parallelism

December 21, 2012, 12:39 pm

≫ Next: Getting a Power View report within Excel 2013 to work with SharePoint

≪ Previous: How It Works: CMemThread and Debugging Them

We had a customer who was doing stress testing on a machine with 40 cores. They designed a program that would launch multiple connections to execute the same query repeatedly based on their requirement to handle multiple concurrent executions. The query was very CPU intensive and a parallel plan was generated. As they increased concurrent connections to hundreds, the CPU would be pegged 100%.

What they noticed was that some connections executed the query far fewer times than others. In other words, the same query didn’t result in same execution time.

On the one hand, driving up CPU to 100% in a sustained period not healthy and the query needed to be tuned. On the other hand, the customer was puzzled as why the same query executions result in a large variation in terms of execution time. Some took much longer than others. They needed us to find out root cause.

First thing we did was to verify if there were different query plans for the same query. From looking at execution plans, they appeared to be the same. That made us really puzzled.

As it turned out, the plans were the same. But during execution phase, SQL Server decided how many threads to use for each execution of the query based on the load. In customer’s situation, because the CPU was pegged and very busy, SQL Server chose to execute some of the query serially. In other words, the parallel plan didn’t get executed with multiple threads. This created ‘uneven’ times because some were indeed executed with multiple threads and others were serially executed.

It’s not easy to spot this problem though. You will need to get “Showplan XML Statistics Profile” trace event.

Even after you get the trace event, it’s hard to spot the difference. You will have to understand what specific operator is doing and determine actual parallelism has occurred.

In the following two screenshots (figures 1 and 2). It’s the same plan. In this hash match operation, both builds and probes sides are parallelized. I am showing the scan for build input (scanning _dta_mv_45.dta_index_…). If you look closely, there are differences between the two. For figure 1, “number of executions” is 1. But for figure 2, “number of executions” is 2.

What this means is that the second plan (figure 2) was truly parallelized. But figure 1, it was a serial execution.

Figure 1

Figure 2

Another quick way to identify this is to open the .sqlplan file in a text editor and search for “RunTimeCountersPerThread “. If all you are seeing is RunTimeCountersPerThread Thread="0", then the plan was never parallelized. It executed serially. Here is an example where parallel threads were used.

Note that: In this post, we also talked about how to use RunTimeCountersPerThread to help solve another issue related to reindexing.

Summary

When your query is compiled with parallelism, there is no guarantee that it will be executed with multiple threads. SQL Server can choose to execute the query with or without parallelism depending on current load of the system. If the server is very busy, this can create some ‘uneven’ response time for the same query with multiple executions. For highly concurrent system, you can reduce DOP to even out this type of variations. Or you can tune your query to reduce the need for parallelism and increase concurrency.

↧

Getting a Power View report within Excel 2013 to work with SharePoint

January 9, 2013, 9:29 am

≫ Next: How can I get that user out of my table quickly

≪ Previous: Uneven query executions with parallelism

I was setting up my SharePoint 2013 server to be able to use an Excel 2013 workbook that had a Power View Report in it. However, when I tried opening the workbook, I got the following error:

In the ULS logs of my SharePoint server that had RS installed on it, I saw the following:

Microsoft.ReportingServices.ReportProcessing.ReportProcessingException: Cannot create a connection to data source 'EntityDataSource'. ---> NoAvailableStreamingServerException: We cannot locate a server to load the workbook Data Model. ---> Microsoft.AnalysisServices.SPClient.Interfaces.ExcelServicesException: We cannot locate a server to load the workbook Data Model. ---> Microsoft.Office.Excel.Server.WebServices.ExcelServerApiException: We cannot locate a server to load the workbook Data Model.

I have a separate SharePoint App Server that has Excel Services setup. In the ULS Log on that box, I saw the following:

01/09/2013 08:47:09.45 w3wp.exe (0x04CC) 0x0C2C Excel Services Application Data Model 27 Medium SSPM: Initialization failed on server DRBALTAR\PowerPivot: Microsoft.AnalysisServices.ConnectionException: A connection cannot be made to redirector. Ensure that 'SQL Browser' service is running. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.0.0.11:2382

In my configuration, I installed the Analysis Services Server on a separate server which is allowed with SharePoint 2013 and SQL 2012 SP1 for a PowerPivot Deployment. Before I dug in a little more, I did a quick search to see what I could find for the actual message “We cannot locate a server to load the workbook Data Model”. I ended up finding the following KB Article:

"We cannot locate a server to load the workbook Data Model" error on a SharePoint site when you refresh a PivotTable in an Excel 2013 workbook
http://support.microsoft.com/kb/2769345/EN-US

I already knew that I had a server defined for the Data Model settings within Excel Services.

Of note, the ConnectionException above is actually pretty descriptive. When I installed the AS Instance (PowerPivot) on DrBaltar, I had opened up the Firewall for the instance itself, but I hadn’t opened up SQL Browser. After opening SQL Browser through the firewall on DrBaltar, I still got a failure. However, now the message was different from within Power View:

Here was the message in the ULS Log on my first SharePoint Server:

Microsoft.ReportingServices.ReportProcessing.ReportProcessingException: Cannot create a connection to data source 'EntityDataSource'. ---> Microsoft.AnalysisServices.AdomdClient.AdomdErrorResponseException: SetAuthContext need to be run as sysadmin.
at Microsoft.AnalysisServices.AdomdClient.AdomdConnection.XmlaClientProvider.Connect(Boolean toIXMLA)
at Microsoft.AnalysisServices.AdomdClient.AdomdConnection.ConnectToXMLA(Boolean createSession, Boolean isHTTP)
at Microsoft.AnalysisServices.AdomdClient.AdomdConnection.Open()

Looking at a Profiler trace we can see a telling clue:

I hadn’t provided Admin rights to the RSService Account for the Analysis Services Instance.

After adding the RS Service Account, the Power View sheet that was in the Excel 2013 Workbook came up within SharePoint 2013.

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

↧

How can I get that user out of my table quickly

January 22, 2013, 6:49 pm

≫ Next: AppDomain unloading messages flooding the SQL Server error log

≪ Previous: Getting a Power View report within Excel 2013 to work with SharePoint

Recently, I worked on an customer issue to help tune their slow query. The query was fairly complex involving multiple table joins. The key issue is the inability to do seeks on a particular table.

The table has a column that stores User Name like below. The values stored contain domain name\user name. Here is a set of sampled fake data:

UserName
mydomain0\user0
mydomain1\user1
mydomain2\user2
mydomain3\user3
mydomain4\user4
mydomain5\user5
mydomain6\user6
mydomain7\user7
mydomain8\user8
mydomain9\user9

Problem

The problem is that their application only passes user name without the domain name. So the parameter will be user0, user1 etc. Now this makes it very challenging to seek on the user. Application couldn’t be changed to take domain name.

So this customer basically used like as the procedure below. This resulted in scanning of the table and caused performance slowdown.

create procedure p_test1 @user varchar(20)
as
select * from t where UserName like '%' + @user

This user did put an index on UserName and saw the query plan had ‘seek’. But the performance still didn’t meet the requirement.

When you use like statement and have an index on the column that uses like, SQL Server tries its best to leverage that index. It actually tries to calculate a range of values based on the value you pass in and then does index seek on the column.

Below is what the plan looks like:

The problem is that the range can be very large. If you notice, the index seek is based on two expressions calculated (with > and <) followed by a filter (where …). So this can be fairly expensive.

In fact, for 2 million dummy rows, it resulted in a logical reads of 9309 pages (as indicated below).

Table 't'. Scan count 1, logical reads 9309, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Though it’s better than table scan, it is still not the best to suite their needs.

Solution

Computed column comes to rescue. We have talked about computed column in our blog. But when it comes to solve practical problems, it still scares some users.

For this particular problem, all we need is to use charindex and substring. The expression of the computed column seems scary. But it is used to cover special cases such as NULL, blank values etc.

After you create the computed column and then create an index on it.

alter table t
add UserAlias as (substring (UserName, case when charindex ('\', UserName) is null or charindex ('\', UserName) = 0 then 1 else charindex ('\', UserName) + 1 end, len (UserName) - case when charindex ('\', UserName) is null or charindex ('\', UserName) = 0 then 1 else charindex ('\', UserName) + 1 end +1) ) persisted

create index ix2 on t(UserAlias)

now change your query like this:

create procedure p_test2 @user varchar(20)
as
select * from t where UserAlias = @user

Here is the query plan:

It had dramatic reduction of logical reads (only 3) from 9309 noted before with the range seek with like

Table 't'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

The CPU went from over 700 ms to 1 ms after the change.

Complete demo script

use tempdb
go
--settting up table and rows
drop table t
go
create table t (UserName varchar(50))
go
set nocount on
declare @i int = 0, @domainname varchar(20), @username varchar(30)
begin tran
while @i < 2000000
begin

       set @domainname =    'mydomain' + cast ( @i %20 as varchar(20))
       set @username = 'user' + cast(@i % 100000 as varchar(20))
       insert into t values (@domainname + '\' + @username)

       set @i = @i + 1
end
commit tran
go

if object_id ('p_test1') is not null drop procedure p_test1
go

create procedure p_test1 @user varchar(20)
as
select * from t where UserName like '%' + @user
go

create index ix on t (UserName)

go
set statistics profile on
set statistics time on
set statistics io on
go
--this query still ends up with many logical reads and fairly high CPU consumption
exec p_test1 'user'
go
set statistics profile off
set statistics time off
set statistics io off
go

--adding a computed column
alter table t
add UserAlias as (substring (UserName, case when charindex ('\', UserName) is null or charindex ('\', UserName) = 0 then 1 else charindex ('\', UserName) + 1 end, len (UserName) - case when charindex ('\', UserName) is null or charindex ('\', UserName) = 0 then 1 else charindex ('\', UserName) + 1 end +1) ) persisted
go

go
--creating an index on the computed column
create index ix2 on t(UserAlias)
go

if object_id ('p_test2') is not null drop procedure p_test2
go
--new procedure to take advantage of the computed column index
create procedure p_test2 @user varchar(20)
as
select * from t where UserAlias = @user

set statistics profile on
set statistics time on
set statistics io on
go
exec p_test2 'user'
go
set statistics profile off
set statistics time off
set statistics io off
go

↧

AppDomain unloading messages flooding the SQL Server error log

January 30, 2013, 8:24 am

≫ Next: Don’t change value of that parameter

≪ Previous: How can I get that user out of my table quickly

This blog is built directly from a customer reported issue. As I helped investigate the source of the issue I thought it would be of interest to a broader audience – hopefully you find this interesting, as well.

Allow me to provide some history of the problem before I dive into extended details.

The SQL Server error log was flooded with the following pattern, approximately every ~1 second. Customer indicated lots of CLR_* wait types and mini-dumps revealed heavy GC (garbage collection) activity taking place.

...
2013-01-22 08:07:44.91 spid37s     AppDomain 890288 (mssqlsystemresource.dbo[ddl].890287) unloaded.
2013-01-22 08:07:45.73 spid31s     AppDomain 890289 (mssqlsystemresource.dbo[ddl].890288) unloaded.
2013-01-22 08:07:46.41 spid34s     AppDomain 890290 (mssqlsystemresource.dbo[ddl].890289) unloaded.

...

The domain name was always the (same). Domain names are based on database and object owner and given a generation id each time they are loaded. The pattern shown above indicates the SQL Server has loaded and unloaded the mssqlsystemresource (DBO) application domain 890,000+ times – ouch!

This is not the normal pattern, as you might have imagined already. The normal pattern is to show the domain loaded message paired with a matching unloaded message and separated by a reasonable length of time as the domain is utilized. The error log was not showing any loaded messages, just the unloaded messages.

The other distinct difference that stood out is the text of the unloaded message. There are ~6 different unloaded messages which the SQL Server can log. All the other messages indicate a reason (out of memory, .NET exception, locking protocol violation, ….) This was just generically unloaded without any reason given and exactly why the customer wanted to know what the cause of the issue was.

Note: The stack traces and debugging activities are using public symbols and a SQL Server 2008 build. http://support.microsoft.com/kb/311503

SQL Server AppDomain Loaded Message

SQL Server does not log the application domain loaded message until the domain has been loaded and initialized. Loading and initializing the CLR interfaces for an application domain are two distinct states. This is important because technically the application domain is loaded within the CLR runtime (not fully initialized) and SQL Server does NOT record the application domain loaded message in the error log.

My Theory

Based on the behavior I had a hunch that this is a query cancellation or error occurring during the loading of the application domain and SQL Server didn’t reach full initialization so SQL Server is just starting up the domain (part way) and tearing it down (unloading) to cleanup properly.

Asking the customer what they were doing with CLR and looking at some traces (.TRC) the geography data type was in use and the supporting assembly is associated with a system database.

I was also able to use the sys.dm_os_ring_buffers to get information related to the application domain state machine changes. I found the domains were transitioning from creating to unloaded in within a few milliseconds.

Ring Buffer: RING_BUFFER_CLRAPPDOMAIN

Ring Buffer: RING_BUFFER_EXCEPTION

SQL Server handles the vast majority of error conditions by throwing the custom C++ exception type (SQL Exception). The exception holds a major and minor error code along with other information associated with the condition. You can use the formula (Major * 100) + (minor) to build the SQL Server error code or message_id as provided in sys.messages.
For a query cancellation this is an internal error (3617) or Major = 36 and Minor = 17.

The stack associated with the cancellation is the following, during the initialization of the application domain.

sqlservr!clr_ex_raise <------- Major = 36, Minor = 17
sqlservr!CAppDomain::CreateManagedDomain
sqlservr!CAppDomain::InitExpensive
sqlservr!CAppDomainManager::GetAppDomain
sqlservr!CCLRHost::GetAppDomain
sqlservr!CAssemblyMetaInfo::GetAppDomainForVerification
sqlservr!CAssemblyMetaInfo::CreateClrInterfaces
sqlservr!CAssemblyMetaInfo::InitClrInterfaces
sqlservr!CAssemblyMetaInfo::LoadAssemblyFromDatabase
sqlservr!CAssemblyMetaInfo::LoadAssemblyFromDatabase

Associated with the CLR_APPDOMAIN, ring buffer are stack frames and I was able to see where the unload was getting triggered.

sqlservr!AppDomainRingBufferRecord::StoreRecord+0x9c
sqlservr!CAppDomain::StateTransition+0xcf
sqlservr!CAppDomainManager::AppDomainStateTransitionLockHeld+0xa5
sqlservr!CAppDomainManager::AppDomainStateTransition+0x30
sqlservr!CAppDomain::UnloadManaged+0x22
sqlservr!CAppDomain::Release+0xd3
sqlservr!CAutoRefc<CAppDomain>::~CAutoRefc<CAppDomain>+0x914a3f
sqlservr!CAssemblyMetaInfo::CreateClrInterfaces+0x135
sqlservr!CAssemblyMetaInfo::InitClrInterfaces+0x7b
sqlservr!CAssemblyMetaInfo::LoadAssemblyFromDatabase+0x144
sqlservr!CAssemblyMetaInfo::LoadAssemblyFromDatabase+0xa2
sqlservr!ResolveUdf+0x2c4
sqlservr!CAlgUtils::TrpGetExpressionPropsAlg+0x523
sqlservr!CAlgUtils::TrpGetExpressionPropsWithHandler+0x112
sqlservr!udf::FBindObject+0x5d
sqlservr!udf::FBind+0x2b3

A few key aspects of this stack of note.

FBind – Used during compile so we are still compiling the query, not an execution/runtime issue so we can use a database clone to reproduce the problem.
LoadAssemblyFromDatabase – Loading the assembly which can create the application domain and in this case is doing just that
CreateClrInterfaces – Doing the initialization work – (We have not printed the AppDomain loaded message yet)
UnloadManaged – Triggers an async application domain unload. It is important to note this is an async operation and the actual unload will occur on a system thread and exactly why the unloaded messages are occurring on a system (s) thread.

Now I needed to exercise this code path to validate my findings. I know if I restarted SQL Server the application domain would not be loaded and the plan would not be in procedure cache.

A coworker of mine (BillHol) assisted in creating a simple function that causes the Microsoft.SqlServer.SqlGeography assembly to be loaded under the mssqlserversystem.dbo application domain. (In SQL 2012 this loads under master.dbo.)

use tempdb
go
CREATE FUNCTION dbo.funcBufferGeography(@p1 geography, @p2 float)
RETURNS geography
AS
BEGIN
       DECLARE @g geography;
       DECLARE @distance float;
       SELECT @g = @p1.STAsText();

       RETURN (@g);
END;
GO

I then went to the debugger and set a breakpoint. I wanted to create a simulated stall during the interface creation to see if I could trigger a query cancellation (attention) during this phase of the compile and would it result in a reproduction of the pattern. The breakpoint is just causing a break to wait 1.5 sec and then continue.

bp sqlservr!CAssemblyMetaInfo::CreateClrInterfaces ".sleep 1500;g"

I used OSTRESS, from the RML toolkit, to execute the function; providing a query timeout of 1 second so while the debugger has the SQL Server process stopped the client query timeout can predictably occur.

ostress -dtempdb -E -S.\sql2008 -Q"DECLARE @garg geography = 'LINESTRING(3 4, 8 11)'; SELECT dbo.funcBufferGeography(@garg, 1.1)" -oc:\temp\breakout –t1

Sure enough I was able to reproduce the behavior in the error log of repeated, unloaded messages without the matching loaded paring. In the debugger you can also see the C++ exceptions being thrown for the 3617 SQL Exception as well.

I can’t/don’t want to hook up a debugger to production!

It is not practical to hook up the debugger on the production instance but once we understand the pattern it is easy to see from a simple SQL Server trace and the ring buffer entries (previously shown.)

The attention (internal 3617 C++ SQL Exception) is always logged after the BatchCompleted event and the Attention event. As you can see each BatchCompleted is followed by an attention.
There is never an error log message for AppDomain loading.
The unloading occurs on SPID=31 and it is marked (IsSystem = 1) maching the output in the error log of 31(s)

Garbage Collection (GC) and CLR_* Wait Types

While loading, initializing or unloading an application domain the SQL Server prevents additional CLR activity against the same application domain. This results in the CLR_* wait types as you might expect; only one thread can load the application domain. This is no different than loading a DLL and the operating system maintaining the process, loader lock (CriticalSection) during the image load and resolution processing.

In this customers case the mini-dump and sys.dm_exec_requests output revealed 34 additional waiters on the geography data type (mssqlsystemresource.dbo app domain.)

GC/Convoy: The GC activity is a side effect but helped cause a convoy on this customers system. When the domain is unloaded the CLR runtime forces a garbage collection (GC) across all generations to make sure anything related to the domain has been properly cleaned up. During GC (usually) all CLR activity is suspended no matter what application domain.

Here is what the convey is doing

SPID 50 – Attempted to load application domain, failed and is unloading and performing GC activity.
SPID 51, 52, 53 … 80 are all waiting on the SQL CLR_* protection object and can’t advance until SPID 50 is able to load and initialize or unload and cleanup the application domain.
The time it takes for 50 to complete the unload/GC causes SPID 51, 52, 53, … 80 to timeout.
SPID 50 completes and SPID 51 tries to load the application domain. The query cancellation is already queued so 51 detects this during initialization and issues the application domain unload again.
SPID 51 completes and SPID 52 tries to load the application domain. …… you get the idea…..

The convey will flood the error log with application domain unload messages and no real work is getting done by the sessions. The server encounters stop and go behavior as it attempts to load the domain, allow CLR workers to execute, unload the domain and suspend CLR worker activities … repeating the behavior over and over again.

Solving the problem

The problem is no different than any another other resource bottleneck troubleshooting. Capturing traces, performance monitor logs and other outputs and tracking down why the original bottleneck occurred. In this case “Why is the system taking so long to load the assembly or is the query timeout improperly set to something tiny?”

Bob Dorr - Principal SQL Server Escalation Engineer

↧

Don’t change value of that parameter

February 3, 2013, 2:15 pm

≫ Next: Query hint QUERYTRACEON is now documented publically

≪ Previous: AppDomain unloading messages flooding the SQL Server error log

Parameter sniffing is a well known among SQL User community. But I have seen variations of this frequently that need a bit creative handling and solution may not that straight forward.

One of the variation is that a user changes the value of the parameter inside the procedure. When the procedure is compiled at first time, it uses the value of the parameter for cardinality estimate. If you change it inside the body to a different value, SQL Server won’t know that and it can cause incorrect cardinality estimate.

Recently I worked with a customer who used an output parameter for a procedure but needed to use the same parameter in a few places to query data. It caused incorrect cardinality. Additionally, ascending key issue is at play as well.

I’m going to use a very simple to illustrate the cardinality estimate problem with faked data. Customer’s issue is more complex involving multiple table join.

Setting up data

use tempdb
go
if object_id ('[Check]') is not null drop table [Check]
go
if object_id ('[Batch]') is not null drop table [Batch]
go
create table Batch (BatchID int identity primary key, BatchType tinyint)
go
create table [Check] (CheckID int identity primary key, BatchID int references Batch (BatchID))
go

set nocount on

declare @i int = 0, @j int = 0
begin tran
while @i < 500
begin
    insert into Batch values (1)
    declare @batchid int = @@identity
    set @j =   cast (RAND() * 1000 as int)

    declare @k int = 0
    while @k < @j
    begin
        insert into [Check] (BatchID) values (@batchid)
        set @k = @k + 1
    end
    set @i = @i + 1
end
commit tran

go
create index ix_check_batchid on [check] (BatchID)
go

if object_id ('p_test') is not null drop procedure p_test
go
create procedure p_test @BatchID int output
as
set nocount on
insert into [Batch] (BatchType) values (1)
select @BatchID = @@IDENTITY
--insert some 200 fake values
insert into [Check] (BatchID) select top 200 @BatchID from [Check]
--now select
select * from [Check] where BatchID = @BatchID

Problem

Note that the procedure p_test takes a parameter called @BatchID. The intention of the procedure is to insert a value into table Batch. Then take the identity value BatchID and insert into another table called Check.

I chose a simple insert with select top from check table itself. But in customer’s scenario, they were actually inserting a large rowset by processing XML documents passed into the procedure using OPENXML.

Here is the problem: SQL Server will not be able to estimate the predicate BatchID = @BatchID correctly in the statement “select * from [Check] where BatchID = @BatchID” .

@BatchID is declared as an output parameter. When the procedure is called and compiled like “exec p_test @Batchid output”, the @batchID will be NULL. That will be the value used for cardinality estimate. But later, the @BatchID gets changed to be the identity value. From the insert (insert into [Check] (BatchID) select top 200 @BatchID from [Check] ), we know for sure there will be 200 values with the new BatchID inside the check table.

The execution plan below shows that “select * from [Check] where BatchID = @BatchID” estimated 1 row. But actually 200 rows were retrieved.

When table check is joined with other tables, this incorrect cardinality estimate will cause much a bigger problem.

set statistics profile on
go
declare @batchid int
exec p_test @Batchid output
select @batchid
go
set statistics profile off

Solution

Knowing that the value of @BatchID is incorrect and gets changed inside the procedure body, I thought it will be easy fix. I just append option (recompile) for the select statement. Here is the complete procedure. When I append option (recompile), the statement be able to use the new value (from the @@Identity)

if object_id ('p_test2') is not null drop procedure p_test2
go
create procedure p_test2 @BatchID int=null output
as
set nocount on
insert into [Batch] (BatchType) values (1)
select @BatchID = @@IDENTITY
insert into [Check] (BatchID) select top 200 @BatchID from [Check]
select * from [Check] where BatchID = @BatchID option (recompile)
go

To my surprise, the estimate didn’t change. The plan below shows the estimate is still 1 row and actual rows were 200. One thing did change. The @BatchID now is replaced with 531. So we know for sure the value of @BatchID is correct because of recompile.

So I went back and studied the procedure a bit more. If you look at the way the BatchID gets into tables Batch and Check, you will know that the values are ever increasing (ascending key problem). If you don’t update statistics right before the query is compiled, the values of BatchID will never be part of the statistics.

Though the optimizer knows precisely what the BatchID is, it won’t help because the statistics still think it doesn’t exist.

Fortunately, there is a solution. Enabling trace flag 2389 and 2390 will trigger stats update for these conditions.

After running dbcc traceon (2389,2390,-1), you will see the estimate changes for better. Note that now the estimate is 506 and actual is 200. The reason why the estimate is not exactly 200 is that this value is not one of the steps in the histogram. SQL Server has to ‘calculate’ the value. For customer’s situation, this dramatically improved performance and changed join type from nested loop to hash join.

Some users may have concerns about the solution to use these trace flag as it triggers more frequent stats update.

You can actually use OPTIMIZE FOR UNKNOWN like “select * from [Check] where BatchID = @BatchID option (optimize for (@BatchID unknown))”. When you do this, SQL Server won’t sniff the parameter. It will simply pick average density for cardinality estimate. This method may work well if your data is uniformly distributed.

There are other solutions such as not using the output parameter this way etc. Many times, when we do support, we are limited how much changes that can be made. Problems usually come after production. So we want to find out best possible solution with minimal change.

Complete Demo script

--setting up data
use tempdb
go
if object_id ('[Check]') is not null drop table [Check]
go
if object_id ('[Batch]') is not null drop table [Batch]
go
create table Batch (BatchID int identity primary key, BatchType tinyint)
go
create table [Check] (CheckID int identity primary key, BatchID int references Batch (BatchID))
go

set nocount on

go
create index ix_check_batchid on [check] (BatchID)
go

if object_id ('p_test') is not null drop procedure p_test
go
create procedure p_test @BatchID int=null output
as
set nocount on
insert into [Batch] (BatchType) values (1)
select @BatchID = @@IDENTITY
--insert some 200 fake values
insert into [Check] (BatchID) select top 200 @BatchID from [Check]
--now select
select * from [Check] where BatchID = @BatchID

--bad esimate for select * from [Check] where BatchID = @BatchID
set statistics profile on
go
declare @batchid int
exec p_test @Batchid output
select @batchid
go
set statistics profile off

--solution using option recompile
if object_id ('p_test2') is not null drop procedure p_test2
go
create procedure p_test2 @BatchID int=null output
as
set nocount on
insert into [Batch] (BatchType) values (1)
select @BatchID = @@IDENTITY
insert into [Check] (BatchID) select top 200 @BatchID from [Check]
select * from [Check] where BatchID = @BatchID option (recompile)
go

--note that estimate is still off
set statistics profile on
go
declare @batchid int
exec p_test2 @batchid output
select @batchid
go
set statistics profile off

go
--enable trace flag
dbcc traceon (2389,2390,-1)

--now estimate is improved dramatically
set statistics profile on
go
declare @batchid int
exec p_test2 @batchid output
select @batchid

go
set statistics profile off

go

--additional solutoin using OPTIMIZE FOR UNKNOWN
create procedure p_test3 @BatchID int=null output
as
set nocount on
insert into [Batch] (BatchType) values (1)
select @BatchID = @@IDENTITY
insert into [Check] (BatchID) select top 200 @BatchID from [Check]
select * from [Check] where BatchID = @BatchID option (optimize for (@BatchID unknown))
go

↧

Query hint QUERYTRACEON is now documented publically

February 7, 2013, 9:08 am

≫ Next: Under rare conditions, using IN clause can cause unexpected SQL behavior

≪ Previous: Don’t change value of that parameter

I just wanted to put a quick note that QUERYTRACEON is now publically documented in http://support.microsoft.com/kb/2801413/en-us. If you have situations where you don’t want to enable an optimizer trace flag globally on your server instance, you can use this hint for a specific query. Note that we only support the trace flags listed in the KB article.

↧

Under rare conditions, using IN clause can cause unexpected SQL behavior

February 11, 2013, 6:41 pm

≫ Next: Breaking Down 18065

≪ Previous: Query hint QUERYTRACEON is now documented publically

I want to make you aware of a latest SQL Server 2008 hotfix documented in http://support.microsoft.com/kb/2791745. Using large number of constants in IN clause can result in SQL Server termination unexpectedly. When this happens, you won’t see anything in errorlog or any dumps generated by SQL Dumper.

The condition to trigger this is not that common. Therefore, you may never experience this type of issue. In order to hit this condition, you must have mismatched numeric data type in the IN clause.

Let’s assume that you have a table defined as “create table t (c1 numeric(3, 0))”. But in the IN clause, you have something like t.c1 in ( 6887 , 18663 , 9213 , 526 , 30178 , 17358 , 0.268170 , 25638000000000.000000 ). Note that precision and scale of the constants exceed the column c1’s precision and scale.

If your have queries like these, then you may experience this unexpected behavior depending on the final query plan. This usually happens when you allow your user to do ad hoc queries and add random number of constant values which may exceed the column’s precision and scale.

Solution

The solution is to apply http://support.microsoft.com/kb/2791745. Note that the issue doesn’t happen on SQL 2012 and we are working on a fix on SQL Server 2008 R2 as well.

↧

Breaking Down 18065

February 13, 2013, 12:13 pm

≫ Next: RS: Database Engine does not meet edition requirements

≪ Previous: Under rare conditions, using IN clause can cause unexpected SQL behavior

We have had two blog posts on this blog regarding the 18056 error. Two from Bob Dorr (and part 2) and another from Tejas Shah. However, we still see a lot of questions about this error message. This error message can show up for different reasons. After those two blog posts were made, we released the following:

FIX: Errors when a client application sends an attention signal to SQL Server 2008 or SQL Server 2008 R2
http://support.microsoft.com/kb/2543687

This fix was specific to the following message and having to do with Attentions:

Error: 18056, Severity: 20, State: 29.
The client was unable to reuse a session with <SPID>, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

Since this was released, there has still continued to be confusion over this error. The intent of the fix above was to limit the amount of noise in the ERRORLOG. And, this was specific to receiving the State 29 with 18056 when an Attention was received. The Attention is the important part here. If an Attention occurred during a reset of a connection, we would normally log that to the ERRORLOG under the State 29. However, with this fix applied, if the Attention occurs during the reset of a connection, you should no longer see the error within the ERRORLOG. This does NOT mean that you will no longer see a State 29.

I will use this post to explain further how we handle these errors to give you a better understanding. To do that, I will expand on Bob Dorr's blog post that I linked above which lists out the states.

States

Default = 1,
GetLogin1, 2
UnprotectMem1, 3
UnprotectMem2, 4
GetLogin2, 5
LoginType, 6
LoginDisabled, 7
PasswordNotMatch, 8
BadPassword, 9
BadResult, 10
FCheckSrvAccess1, 11
FCheckSrvAccess2, 12
LoginSrvPaused, 13
LoginType, 14
LoginSwitchDb, 15
LoginSessDb, 16
LoginSessLang, 17
LoginChangePwd, 18
LoginUnprotectMem, 19
RedoLoginTrace, 20
RedoLoginPause, 21
RedoLoginInitSec, 22
RedoLoginAccessCheck, 23
RedoLoginSwitchDb, 24
RedoLoginUserInst, 25
RedoLoginAttachDb, 26
RedoLoginSessDb, 27
RedoLoginSessLang, 28
RedoLoginException, 29 (Kind of generic but you can use dm_os_ring_buffers to help track down the source and perhaps –y. Think E_FAIL or General Network Error)
ReauthLoginTrace, 30
ReauthLoginPause, 31
ReauthLoginInitSec, 32
ReauthLoginAccessCheck, 33
ReauthLoginSwitchDb, 34
ReauthLoginException, 35

**** Login assignments from master ****

LoginSessDb_GetDbNameAndSetItemDomain, 36
LoginSessDb_IsNonShareLoginAllowed, 37
LoginSessDb_UseDbExplicit, 38
LoginSessDb_GetDbNameFromPath, 39
LoginSessDb_UseDbImplicit, 40 (We can cause this by changing the default database for the login at the server)
LoginSessDb_StoreDbColl, 41
LoginSessDb_SameDbColl, 42
LoginSessDb_SendLogShippingEnvChange, 43

**** Connection String Values ****

RedoLoginSessDb_GetDbNameAndSetItemDomain, 44
RedoLoginSessDb_IsNonShareLoginAllowed, 45
RedoLoginSessDb_UseDbExplicit, 46 (Data specified in the connection string Database=XYX no longer exists)
RedoLoginSessDb_GetDbNameFromPath, 47
RedoLoginSessDb_UseDbImplicit, 48
RedoLoginSessDb_StoreDbColl, 49
RedoLoginSessDb_SameDbColl, 50
RedoLoginSessDb_SendLogShippingEnvChange, 51

**** Common Windows API Calls ****

ImpersonateClient, 52
RevertToSelf, 53
GetTokenInfo, 54
DuplicateToken, 55
RetryProcessToken, 56
LoginChangePwdErr, 57
WinAuthOnlyErr, 58

**** New with SQL 2012 ****

DbAuthGetLogin1, 59
DbAuthUnprotectMem1, 60
DbAuthUnprotectMem2, 61
DbAuthGetLogin2, 62
DbAuthLoginType, 63
DbAuthLoginDisabled, 64
DbAuthPasswordNotMatch, 65
DbAuthBadPassword, 66
DbAuthBadResult, 67
DbAuthFCheckSrvAccess1, 68
DbAuthFCheckSrvAccess2, 69
OldHash, 70
LoginSessDb_ObtainRoutingEnvChange, 71
DbAcceptsGatewayConnOnly, 72

Pooled Connections

An 18056 error can only occur when we are trying to reset a pooled connection. Most applications I see these days are setup to use pooled connections. For example, a .NET application will use connection pooling by default. The reason for using pooled connections are to avoid some of the overhead of creating a physical hard connection.

With a pooled connection, when you close the connection in your application, the physical hard connection will stick around. When the application then goes to open a connection, using the same connection string as before, it will grab an existing connection from the pool and then reset the connection.

When a connection is reset, you will not see sp_reset_connection over the wire. You will only see the "reset connection" bit set in the TDS Packet Header.

Frame: Number = 175, Captured Frame Length = 116, MediaType = ETHERNET
+ Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[00-15-5D-4C-B9-60],SourceAddress:[00-15-5D-4C-B9-52]
+ Ipv4: Src = 10.0.0.11, Dest = 10.0.0.130, Next Protocol = TCP, Packet ID = 18133, Total IP Length = 102
+ Tcp: [Bad CheckSum]Flags=...AP..., SrcPort=59854, DstPort=1433, PayloadLen=62, Seq=4058275796 - 4058275858, Ack=1214473613, Win=509 (scale factor 0x8) = 130304
- Tds: SQLBatch, Version = 7.3 (0x730b0003), SPID = 0, PacketID = 1, Flags=...AP..., SrcPort=59854, DstPort=1433, PayloadLen=62, Seq=4058275796 - 4058275858, Ack=1214473613, Win=130304
- PacketHeader: SPID = 0, Size = 62, PacketID = 1, Window = 0
PacketType: SQLBatch, 1(0x01)
Status: End of message true, ignore event false, reset connection true, reset connection skip tran false
Length: 62 (0x3E)
SPID: 0 (0x0)
PacketID: 1 (0x1)
Window: 0 (0x0)
- TDSSqlBatchData:
+ AllHeadersData: Head Type = MARS Header
SQLText: select @@version

In the above example, we are issuing a SQL Batch on a pooled connection. Because it was a pooled connection, we have to signal that we need to reset the connection before the Batch is executed. This is done via the "reset connection" bit.

After the above SQLBatch is issued, the app could then turn around and issue an Attention to cancel the request. This is what resulted in the 18056 with State 29 in the past under the condition of an attention.

Frame: Number = 176, Captured Frame Length = 62, MediaType = ETHERNET
+ Ethernet: Etype = Internet IP (IPv4),DestinationAddress:[00-15-5D-4C-B9-60],SourceAddress:[00-15-5D-4C-B9-52]
+ Ipv4: Src = 10.0.0.11, Dest = 10.0.0.130, Next Protocol = TCP, Packet ID = 18143, Total IP Length = 48
+ Tcp: [Bad CheckSum]Flags=...AP..., SrcPort=59854, DstPort=1433, PayloadLen=8, Seq=4058275858 - 4058275866, Ack=1214473613, Win=509 (scale factor 0x8) = 130304
- Tds: Attention, Version = 7.3 (0x730b0003), SPID = 0, PacketID = 1, Flags=...AP..., SrcPort=59854, DstPort=1433, PayloadLen=8, Seq=4058275858 - 4058275866, Ack=1214473613, Win=130304
- PacketHeader: SPID = 0, Size = 8, PacketID = 1, Window = 0
PacketType: Attention, 6(0x06)
Status: End of message true, ignore event false, reset connection false, reset connection skip tran false
Length: 8 (0x8)
SPID: 0 (0x0)
PacketID: 1 (0x1)
Window: 0 (0x0)

In this case, we would still be in the process of doing the connection reset which would be a problem. Bob Dorr's Part 2 blog that is linked above goes into good detail for how this actually occurs.

So, no more State 29?

The thing to realize about State 29 is that it is a generic state just indicating that an exception has occurred while trying to redo a login (Pooled Connection). This exception was not accounted for in any other logic to produce a different state that is listed above. Something similar to like an E_FAIL or General Network Error.

Going forward, assuming you the above fix applied, or are running on SQL 2012 which has it as well, if you get a State 29, it will not be because of an Attention because we are not logging the 18056 any longer for the Attention, however, if you look at dm_os_ring_buffers, you will still see the actual Attention (Error 3617). We just don't log the 18056 any longer to avoid noise.

<Record id= "3707218" type="RING_BUFFER_EXCEPTION" time="267850787"><Exception><Task address="0x52BDDC8"></Task><Error>3617</Error><Severity>25</Severity><State>23</State><UserDefined>0</UserDefined></Exception><Stack

There are things that occur in the course of resetting a login that could trigger a State 29. One example that we have seen is a Lock Timeout (1222).

In the Lock Timeout scenario, the only thing logged to the ERRORLOG was the 18056. We had to review the dm_os_ring_buffersDMV to see the Lock Timeout.

<Record id= "3707217" type="RING_BUFFER_EXCEPTION" time="267850784"><Exception><Task address="0x4676A42C8"></Task><Error>1222</Error><Severity>16</Severity><State>55</State><UserDefined>0</UserDefined></Exception><Stack

The Lock Timeout was a result of statements issuing "SET LOCK_TIMEOUT 0" which affects the connection itself. When the connection is "reset", the SET statements are carried forward. Then based on timing, and whether an exclusive lock is taken based on what the Login logic is looking for, it could end up affecting Logins off of a Pooled Connection when that connection is reused. The default lock timeout for a connection is -1.

Now what?

If you receive a State 29, you should follow that up by looking in the dm_os_ring_buffers. You will want to look at the RING_BUFFER_EXCEPTION buffer type.

selectcast(recordasXML) asrecordXML
fromsys.dm_os_ring_buffers
wherering_buffer_type ='RING_BUFFER_EXCEPTION'

The error that you find should help explain the condition, and/or allow you to troubleshoot the problem further. If you see 3617, then you will want to look at applying the hotfix above to prevent those messages from being logged. If you see a different error, then you may want to collect additional data (Profiler Trace, Network Trace, etc…) to assist with determining what could have led to that error.

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

↧

RS: Database Engine does not meet edition requirements

February 20, 2013, 9:05 am

≫ Next: Unable to register .NET framework assembly not in the supported list

≪ Previous: Breaking Down 18065

I've ran across the following error a few times and thought I would post this out there for people to understand what is happening.

ERROR: Throwing Microsoft.ReportingServices.Diagnostics.Utilities.OperationNotSupportedException: , Microsoft.ReportingServices.Diagnostics.Utilities.OperationNotSupportedException: The feature: "The Database Engine instance you selected is not valid for this edition of Reporting Services. The Database Engine does not meet edition requirements for report data sources or the report server database. " is not supported in this edition of Reporting Services.;

You may also see a similar message in your Event Logs.

This error is a result of mismatched SKU's between the Reporting Server and the Database Engine as the message mentions. You can also look at two entries in the RS Server logs to see what it thought it hit when it performed the SKU check.

resourceutilities!WindowsService_0!e44!02/19/2013-08:58:28:: i INFO: Reporting Services starting SKU: Enterprise
library!WindowsService_0!e54!02/19/2013-08:58:34:: i INFO: Catalog SQL Server Edition = Enterprise

Where this has caused confusion is when the Catalog SQL Server Edition SKU shows the following:

This will cause the above error. From a usability perspective, Developer edition is essentially the Enterprise product that you can use for testing. However, Reporting Services has a specific SKU check to prevent running an Enterprise version of a Report Server against a Developer or Eval version of the Storage Engine.

It looks something like this:

case Standard:
case Enterprise:
case EnterpriseCore:
case DataCenter:
case BusinessIntelligence:

restricted.Add(Developer);
restricted.Add(Evaluation);

Of note, Developer edition cannot use Eval and Eval edition cannot use Developer.

So, how do we correct this if we run into this situation? You will need to uninstall and reinstall Reporting Services with the correct Product Key that matches the SKU you are looking for. If you are wanting Developer Edition, you will need to run setup with the Product Key for Developer. The Edition SKU is part of the Setup process and is stamped for that instance.

For Reporting Services 2012 in SharePoint Integrated mode, it is considered a Shared Feature, but still is reliant on the SKU that you used to run SQL Setup with. The catch here is that it is not an Instance per se. You still need to make sure that you are installing it with the proper Product Key for the edition you are using.

The product will then make a WMI call to get the "EditionName" property to determine what is appropriate.

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

↧

Unable to register .NET framework assembly not in the supported list

February 23, 2013, 12:18 pm

≫ Next: Temp table caching improvement for table valued parameters in SQL Server 2012

≪ Previous: RS: Database Engine does not meet edition requirements

SQL Server has supported CLR usage since version 2005. But support of .NET framework assemblies within SQL Server is limited per our support policy in KB http://support.microsoft.com/kb/922672.

Some users chose to use .NET framework assemblies outside the list in KB http://support.microsoft.com/kb/922672. This can cause various issues. Lately we have had a few reports of the following error following upgrade to SQL Server 2012.

Msg 6544, Level 16, State 1, Line 2

CREATE ASSEMBLY for assembly '<assembly name>' failed because assembly ‘<assembly name>’ is malformed or not a pure .NET assembly. Unverifiable PE Header/native stub.

A little background. When you develop your user assembly, you can reference .NET framework assemblies. If the referenced .NET framework assemblies are all from the supported list, you only need to register your own user assembly by using CREATE ASSEMBLY statement. When you use a .NET framework assembly that is not in the supported list, the following happens:

You are required to mark your assembly to be unsafe.
You are required to use CREATE ASSEMBLY statement to register .NET framework assembly and referenced assemblies (not in the supported list) within SQL Server database. In other words, the .NET framework assembly has to physically reside in a SQL Server database just the same as your own assembly.
When you do this, you are presented with a warning: “Warning: The Microsoft .Net frameworks assembly 'AssemblyName' you are registering is not fully tested in SQL Server hosted environment.”

There are two types of .NET assemblies. Pure .NET assemblies only contain MSIL instructions. Mixed assemblies contain both unmanaged machine instructions and MSIL instructions. Mixed assemblies in general are compiled by C++ compiler with /clr switch but contain machine instructions resulting from native C++ code.

Regardless which version of SQL Server, CREATE ASSEMBLY only allows pure .NET assemblies to be registered. SQL Server has always required that an assembly to be loaded into SQL Server database with CREATE ASSEMBLY contains only MSIL instructions (pure assembly). CREATE ASSEMBLY will raise the above error if an assembly to be registered is mixed assembly.

Why are we seeing this issue now more often than before?

SQL Server 2005, 2008 and 2008 R2 use CLR 2.0. In SQL Server 2012, we upgraded CLR to use 4.0. As a result, all the .NET framework assemblies will need to be in version 4.0. If you have used a .NET framework assembly that is not in the supported list, you must re-register the 4.0 version using CREATE ASSEMBLY statement following upgrade. Some .NET framework assembly such as WFC started referencing mixed mode assembly in 4.0. Therefore you started to experience the issue in SQL 2012 instead of early versions.

A couple of clarifications

The above error can occur in any version of .NET framework if the assembly you are trying to register (with CREATE ASSEMBLY) is not a pure .NET assembly. A .NET framework assembly is not guaranteed to be a pure .NET assembly in very version. Additionally, a newer version assembly may reference non-pure .NET assembly. In such situations, upgrade will fail with the above error.
The issue occurs only if you use unsupported .NET framework assemblies which result in validation because of CREATE ASSEMBLY is involved. If your user assembly references the assemblies in the list documented in KB http://support.microsoft.com/kb/922672 (which will be updated to reflect the issue documented in this blog), we ensure it will work.

Jack Li | Senior Escalation Engineer | Microsoft SQL Server Support

↧

Temp table caching improvement for table valued parameters in SQL Server 2012

February 26, 2013, 12:12 pm

≫ Next: switchoffset built-in function can cause incorrect cardinality estimate

≪ Previous: Unable to register .NET framework assembly not in the supported list

I wanted to point out a nice performance improvement related to table valued parameters (TVP) in SQL Server 2012. It’s not currently documented in our online documentation. But we have had customers who inquired about this.

When you use TVP, SQL Server internally uses temp table to store the data. Starting SQL Server 2005, temp tables can be cached for re-used. Caching reduces contentions such as page latch contentions on system tables which can occur as temp tables are created and dropped at a high rate.

If you use TVP with a stored procedure, temp table for the TVP will be cached since SQL Server 2008 when TVP was introduced.

But if you use TVP together with parameterized queries, temp tables for TVP won’t be cached in SQL 2008 or 2008 R2. This leads to page latch contentions on system tables mentioned earlier.

Starting SQL Server 2012, table tables for TVP are cached even for parameterized queries.

Below are two perfmon results for a sample application that uses TVP in a parameterized query. Figure 1 shows that SQL 2008 R2 had sustained a high “temp table creation rate” until the test is complete. Figure 2 shows that SQL 2012 had just a very quick spike for “temp table creation rate” but then it went to zero while running rest of the test.

Figure 1: SQL Server 2008 R2’s Temp table Creation Rate

Figure 2: SQL Server 2012’s Temp Table Creation Rate

Just on a side note, a parameterized query uses sp_executesql from SQL Server perspective. From application perspective, the following ADO.NET pseudo-code will generate parameterized query:

SqlCommand cmd = ….;
cmd.CommandText = "SELECT Value FROM @TVP"
cmd.CommandType = System.Data.CommandType.Text;
DataTable tvp = new DataTable();
//adding rows to the datatable
SqlParameter tvpParam = cmd.Parameters.AddWithValue("@MyParameter", tvp);
tvpParam.SqlDbType = SqlDbType.Structured;

Jack Li | Senior Escalation Engineer | Microsoft SQL Server Support

↧

switchoffset built-in function can cause incorrect cardinality estimate

March 2, 2013, 1:20 pm

≫ Next: SQL Server 2012 partitioned table statistics update behavior change when rebuilding index

≪ Previous: Temp table caching improvement for table valued parameters in SQL Server 2012

Recently, we received a call from a customer reported that a query was slow. Upon further investigation, his query has a predicate that look like this:

select * from t o where c1 >switchoffset (Convert(datetimeoffset, GETDATE()), '-04:00')

Upon further investigation, we discovered that it was a cardinality issue. The data this customer had is such that there was no date beyond today. All the dates are in the past (as it for most scenarios).

SQL Server has many built-in/intrinsic functions. During query compilation, optimizer can actually ‘peek’ the value by ‘executing’ the function to provide better estimate. For example, if you use getdate() like (“select * from t where c1 > getdate()”), optimizer will be able actually get the value of getdate() and then use histogram to obtain accurate estimate.

DateAdd is another intrinsic function that optimizer can do the same trick.

But switchoffset is not one of those intrinsic functions and optimizer can’t ‘peek’ the value and utilize histogram.

Just to compare the difference, query “select * from t o where c1 >switchoffset (Convert(datetimeoffset, GETDATE()), '-04:00')” shows incorrect estimate (74397 rows).

But “select * from t o where c1 > convert (datetimeoffset, dateadd (dd, 0, getdate()))” shows correct estimate. Note that the two queries are identical. But I used them to illustrate the difference in terms of cardinality estimate.

Solution

When you use switchoffset together with getdate(), it’s best when you ‘precompute’ the value and then plug it in your query. Here is an example:

declare @dt datetimeoffset = switchoffset (Convert(datetimeoffset, GETDATE()), '-04:00')
select * from t where c1 > @dt option (recompile)

Complete demo script

if object_id ('t') is not null
drop table t
go
create table t (c1 datetimeoffset)
go
declare @dt datetime, @now datetime
set @dt = '1900-01-01'
set @now = SYSDATETIMEOFFSET()
set nocount on
begin tran
while @dt < @now
begin
insert into t values (@dt)
insert into t values (@dt)
insert into t values (@dt)
insert into t values (@dt)
insert into t values (@dt)
insert into t values (@dt)
set @dt = dateadd (dd, 1, @dt)
end
commit tran
go
create index ix on t (c1)
go

set statistics profile on
go
--inaccurate estimate
select * from t where c1 >switchoffset (Convert(datetimeoffset, GETDATE()), '-04:00')
--accurate estimate
select * from t where c1 > convert (datetimeoffset, dateadd (dd, 0, getdate()))
--accurate estimate
declare @dt datetimeoffset = switchoffset (Convert(datetimeoffset, GETDATE()), '-04:00')
select * from t where c1 > @dt option (recompile)

go
set statistics profile off

Jack Li | Senior Escalation Engineer |Microsoft SQL Server Support

↧

SQL Server 2012 partitioned table statistics update behavior change when rebuilding index

March 19, 2013, 7:52 am

≫ Next: System Center Advisor is now free

≪ Previous: switchoffset built-in function can cause incorrect cardinality estimate

In this blog, I will talk about a couple of things related to statistics update when rebuilding index on a partitioned table.

In past versions, when you rebuild an index, you will get statistics update equivalent to FULLSCAN for free. This is true regardless if the table is partitioned table or not.

But SQL Server 2012 changed the behavior for partitioned table. If a table is partitioned, ALTER INDEX REBUILD will only update statistics for that index with default sampling rate. In other words, it is no longer a FULLSCAN. This is documented in http://technet.microsoft.com/en-us/library/ms188388.aspx. But lots of users do not realized that. If you want fullscan, you will need to run UPDATE STATISTCS WITH FULLSCAN. This change was made because we started to support large number of partitions up to 15000 by default. Previous versions did support 15000 partitions. But it’s not on by default. Supporting large number of partitions will cause high memory consumption if we track the stats with old behavior (FULLSCAN). With partitioned table, ALTER INDEX REBUILD actually first rebuilds index and then do a sample scan to update stats in order to reduce memory consumption.

Another behavior change is actually a bug. In SQL 2012, ALTER INDEX REBUILD doesn’t preserve norecompute property for partitioned tables. In other words, if you specify norecompute on an index, it will be gone after you run ALTER INDEX REBUILD for SQL 2012. We have corrected this issue in a newly released CU 3 of SQL Server 2012 SP1. Here is the KB: http://support.microsoft.com/kb/2814780

Jack Li | Senior Escalation Engineer | Microsoft SQL Server Support