Quantcast
Channel: CSS SQL Server Engineers
Viewing all 339 articles
Browse latest View live

How It Works: Spinlock Of Type LPE_BATCH

$
0
0

A question arose asking - "What does the LPE_BATCH spinlock type represent?"

You can see the LPE_* series of spinlocks using the DMV query "select * from sys.dm_os_spinlock_stats"

A spinlock is a lightweight, user mode synchronization object used to protect a specific structure. The goal of the spinlock is to be fast and efficient.  So the ultimate goal is to see is 0 collisions and 0 spins.  SQL Server 2008 contains 150+ spinlock objects to protect the various internal structures during multi-threaded access.

It will sound redundant but the LPE_BATCH spinlock protects the batch.  For example:

select * from authors – Batch #1

go

select * from titles – Batch #2

go

 

I looked at a few places where LPE_BATCH is used in the code.

 

1.       Deadlock monitor can retrieve the task and needs synchronized access to the batch’s 'Task'

2.       Possible state changes on the batch – for example RUNNING, IDLE, ENQUEUED

3.       Enter and Exit transaction states like DTC or MARS

 

If you are seeing collisions or spins on the LPE_BATCH spinlock I suggest you look at the before mentioned areas.  

Bob Dorr - Principal SQL Server Escalation Engineer


SQLIOSim - Is Error: Unable to get disk cache info really an error?

$
0
0

 

The short answer is that it is not an error and the message should be a WARNING.

 

clip_image002

 

I looked at the latest, internal code base it has already been changed to a WARNING.

 

                           if (!DeviceIoControl (volume,

                                                              IOCTL_DISK_GET_CACHE_INFORMATION,

                                                              NULL,

                                                              0, 

                                                              &m_diskCacheInfo,

                                                              sizeof (m_diskCacheInfo), 

                                                              &dwBytes, 

                                                              NULL) || sizeof (m_diskCacheInfo) != dwBytes)

                           {

                                  Logger(TYPE_WARN, _T(__FILE__), __LINE__, _T(__FUNCTION__), HRESULT_FROM_WIN32 (GetLastError()),

                                                       _T("Unable to get disk cache info for %s"), mountPoint);

                           }

 

 

The only place the returned value is used in the code is to display the possible cache settings in the SQLIOSim log. 

 

       Logger(TYPE_INFO, _T(__FILE__), __LINE__, _T(__FUNCTION__), 0,

                                                       _T("DRIVE LEVEL: ")

                                                       _T("Read cache enabled = %ls, ")

                                                       _T("Write cache enabled = %ls"),

                                                       m_diskCacheInfo.ReadCacheEnabled ? L"Yes" : L"No",

                                                       m_diskCacheInfo.WriteCacheEnabled ? L"Yes" : L"No");

 

The SQLIOSim testing attempt will continue without any impact of this error/warning.

 

Bob Dorr -  Principal SQL Server Escalation Engineer

How It Works: SqlDataReader::RecordsAffected and Why it Returns -1

$
0
0

I encountered another interesting research issue to share with you.  The documentation states RecordsAffected are for INSERT, UPDATE and DELETE but it might fool you.

------------------------------------------------------------------------------------

Sent: Monday, June 07, 2010 9:06 PM
Subject: "set nocount off"

 

I am trying to figure out the logic behing “set nocount”.  I want to get the row count when the procedure is executed so I can display a progress bar based on the returned rows (using SqlDataAdapter to read the data – so I need to know the count in advance).  Since I need a temp table, I set “nocount on” at first, then I set it to “set nocount off” before I returned the record set.  I aways get “-1” in RowsAffected.  I  also tried using “Print” or “RaiseError” with  InfoMessage, but the event is never raised.   Any ideas on how to accomplish this task? Or why I get “-1” (i.e. what are the conditions for “-1” when “set nocount off” is in effect?).  Thanks.

------------------------------------------------------------------------------------

From: Robert Dorr
Subject: RE: "set nocount off"

 

The count is always returned at the end of the result set.   So you idea about a progress bar based on count won’t work with nocount logic.   If you want this you need to do a select count(…..) query, read that result set and then run the query to returns the actual rows.   This generally means you have to run the query 2 times so I am not sure this is really what you want.

 

Most of the time I have seen folks just build a progress bar with a count of say 1000 and as they fetch rows they increment the progress counter and let it wrap around to show they are doing work but display a row count (TEXT) to show that rows are being fetched.

------------------------------------------------------------------------------------

Sent: Tuesday, June 08, 2010 9:24 AM
Subject: RE: "set nocount off"

 

The sample below gives me an accurate count of the “rowcount/RecordsAffected” up-front.  It works in one case but not in others, so I am trying to figure out what is affecting the “RecordsAffected” property – so I can explain why in some cases I am getting “-1” and in others, I am getting an accurate count.  Perhaps,  this is a questions for the ADO.NET alias;  I just want to make sure that I understand the server behavior (for example: what happens if use multiple times “set nocount on/off” in a single proc).   Thanks for replying.

 

SqlCommand cmd = new SqlCommand("sp_Test", sqlConnection);

SqlDataReader dr = cmd.ExecuteReader();

int i = dr.RecordsAffected;

if (i > -1) ShowProgressBar...

create procedure sp_Test

as

set nocount off

create table #t (ID int)

declare @i int

set @i = 1000

while @i > 0

begin

insert #t select @i

set @i = @i - 1

end

select * from #t

drop table #t

------------------------------------------------------------------------------------
From: Robert Dorr
Sent: Tuesday, June 08, 2010 10:50 AM
Subject: RE: "set nocount off"

 

I am suspicious that the answer is that RecordsAffected are for INSERT, UPDATE and DELETE and not SELECT but I need to do a bit more digging.

 

RecordsAffected

Gets the number of rows changed, inserted, or deleted by execution of the Transact-SQL statement. (Overrides DbDataReader.RecordsAffected.)

 

------------------------------------------------------------------------------------

Sent: Tuesday, June 08, 2010 9:58 AM
Subject: RE: "set nocount off"

 

In calss ADO, there used to be sql Errors collection on the connection object – where I can retrieve “print” messages.  I ADO.NET, they created an InfoMessage event but it’s never fired in my tests – I was hoping to simply “print” the @@rowcount and retrieve it via the event…

 

For the RecordsAffected, it’s definitely affected by the “select”  - it’s just that there is no doc on how it’s affected – for example, I use ‘set nocount on” to avoid messages fro previous statements, and then set it to “set nocount off’ for the stmt in question, then the RecordsAffected becomes “-1” (weather it’s before or after I retrieve the recordset)….it’s a very nice feature but lack of docs is making it unusable. J  Thanks for looking into this.

 

------------------------------------------------------------------------------------

From: Robert Dorr
Sent: Tuesday, June 08, 2010 11:15 AM
Subject: RE: "set nocount off"

 

I can repro the behavior with a simple “select 1” command to return -1 for the RowsAffected.

 

I followed the SqlCommand::ExecuteReader code down to the actual TDS invocation to see what it is doing.

 

When the SqlDataReader is created the Rows Affected is set to -1.

 

        SqlDataReader(…)

        {

            this._recordsAffected = -1;

               …

 

Then I followed the logic into the TDS parser into the ProcessDone logic (the set nocount controls the done and done_in_proc token generations).

 

It is similar to what I thought it was.   The reader does not fetch all rows you have to read the rows to get to the done for the select statement so the data reader does not appear to be keeping track of this because you already had to fetch all the rows anyway.

 

Then I tried the following and I get back 10,000 for the rows affected.  Correct – the batch affected (INSERTED 10000 rows)

 

    SqlCommand cmd = new SqlCommand("declare @i int; create table #tmp (iID int); set @i=0; while(@i < 100000) begin insert into #tmp values(@i) set @i = @i +1 end; select * from #tmp", sqlConn);

            SqlDataReader dr = cmd.ExecuteReader();

            int i = dr.RecordsAffected;

 

So I changed to the following TOP and I still get 10,000 – Expected the select(s) don’t set the RecordsAffected

 

    SqlCommand cmd = new SqlCommand("declare @i int; create table #tmp (iID int); set @i=0; while(@i < 100000) begin insert into #tmp values(@i) set @i = @i +1 end; select TOP 10 * from #tmp", sqlConn);

       

I then used variants of set nocount and you can see you can impact the number of rows affected returned by where you place the set statement.   In this example I insert (affect 20000 rows) but if you set the nocount on around the first 10000 the RecordsAffected returns 10000 instead of 20000.

 

    SqlCommand cmd = new SqlCommand("set nocount on; declare @i int; create table #tmp (iID int); set @i=0; while(@i < 100000) begin insert into #tmp values(@i) set @i = @i +1 end; set @i=0; set nocount off; while(@i < 100000) begin insert into #tmp values(@i) set @i = @i +1 end; select TOP 10 * from #tmp", sqlConn);

            SqlDataReader dr = cmd.ExecuteReader();

            int i = dr.RecordsAffected;

 

The only reliable way to get the count of a specific select is NOT the RecordsAffected but to count the rows as you read them.

 

Back to what you are trying to do is determine the number of rows that the select will return to show progress it won’t be RecordsAffected.   This goes back to the age old documentation and patterns I have worked since 1994 that you must process all your results Read and NextResult loops or you can end up with unexpected behaviors.   I have seen customers execute a stored procedure that returns lots of results (NextResult) that take more than one TDS packet.  They only fetch the first result and dispose the command.   It used to be DBLIB and dbexec or ODBC SQLExecuteDirect and then dbcancel/SQLCancel/SQLFreeStmt to dump the reset of the results.  The problem is that this sends the attention/cancel to the SQL Server so if the rest of the procedure/batch has not run yet the execution is cancelled.   So the user thought they ran the entire batch/procedure and only part of it executed.

 

You mentioned you are using an event to print @@ROWCOUNT but that has the same issue as I described.  The event won’t fire until all the rows from the select have been processed and you progress to the NextResult.   The following code shows that behavior.   If you want the row count before the select in your sample you need to print/select the number of rows before the select and that is generally performance prohibitive as you end up running the query twice.

 

using System;

using System.Data.SqlClient;               

using System.Text;

 

namespace ConsoleApplication1

{

    class Program

    {

        static void InfoMessageHandler(object sender, SqlInfoMessageEventArgs e)

        {

            Console.WriteLine(e.ToString());

        }

 

 

        static void Main(string[] args)

        {

            SqlConnection sqlConn = new SqlConnection(@"Data Source=.\SQL2008;Integrated Security=SSPI;");

 

            sqlConn.Open();

 

            sqlConn.InfoMessage += new SqlInfoMessageEventHandler(InfoMessageHandler);

 

            SqlCommand cmd = new SqlCommand("print 'Hello'; set nocount on; declare @i int; create table #tmp (iID int); set @i=0; while(@i < 100000) begin insert into #tmp values(@i) set @i = @i +1 end; set @i=0; set nocount off; while(@i < 100000) begin insert into #tmp values(@i) set @i = @i +1 end; select TOP 10 * from #tmp; print @@ROWCOUNT", sqlConn);

            SqlDataReader dr = cmd.ExecuteReader();

            int i = dr.RecordsAffected;

 

            // Call Read before accessing data.

            do

            {

                while (dr.Read())

                {

                    Console.WriteLine(String.Format("{0}", dr[0]));

                }

 

                i = dr.RecordsAffected;

 

            }

            while(dr.NextResult());

 

            sqlConn.Close();

 

        }

    }

}

 

Where did the SQL Server Instance disappear? The clue may be in the WMI logs!

$
0
0

We recently worked with a customer who ran into an interesting situation. This problem deals with SQL Server 2005 Service Pack 3 setup.

Normally, when you launch the SQL 2005 SP3 setup and you reach the screen which shows the components for which you can apply the service pack, you will get a list of all the product components. For a server with one default instance of SQL Server Database Services installed, the list will appear as shown below.

image

 

In this customer’s scenario, there were 2 servers which did not list all the components. Their setup screen looked like the following:

image

Notice that the 3 components highlighted from the previous screen is missing in this screen. Because of this situation, they cannot apply the SQL 2005 SP3 to these 3 components on these servers. The components [database services, integration service and client] were working properly. Only when setup attempts to enumerate the installed components, it was unable to get the complete list.

We started looking at their setup logs and did not find any errors or warning that would indicate any problem. Next we started looking at how the setup enumerates the installed components and qualify them for the upgrade to this service pack. We verified that the instance is listed properly in the registry key: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\Instance Names\SQL. If this registry key does not contain the correct information, then you can encounter this problem. That was not the issue for them.

When parsing through the Event logs we noticed the following entries appearing at regular intervals:

image

Even though this event log entry did not have direct correlation to the setup attempts or timing, it gave a vital clue. Something was wrong with the WMI infrastructure on this machine. We know that setup uses WMI heavily to perform the discovery and enumeration of the installed instances and components. So we turned our attention to WMI logging.

Here is the relevant snippet from wbemcore.log that showed the results of the WMI calls from the SQL Server setup program: 

(Mon May 29 14:31:01 2010.759152890) : GetUserDefaultLCID failed, restorting to system verion(Mon May 29 14:31:01 2010.759153015) : CALL CWbemNamespace::OpenNamespace

   BSTR NsPath = cimv2

   long lFlags = 0

   IWbemContext* pCtx = 0x0

   IWbemServices **pNewContext = 0x270F058

(Mon May 29 14:31:01 2010.759153062) : STARTING a main queue thread 2592 for a total of 1

(Mon May 29 14:31:02 2010.759153484) : GetUserDefaultLCID failed, restorting to system verion(Mon May 29 14:31:02 2010.759153500) : CALL CWbemNamespace::OpenNamespace

   BSTR NsPath = default

   long lFlags = 0

   IWbemContext* pCtx = 0x0

   IWbemServices **pNewContext = 0x279F058

(Mon May 29 14:31:02 2010.759153531) : Error 80041002 occured executing request for root\default

 

This snippet informs us that connections to root\cimv2 namespace was successful but we encountered a failure when connecting to root\default namespace.

Next we used the WBEMTEST.EXE tool [located @ C:\WINDOWS\SYSTEM32\WBEM\] to isolate this to be a clear WMI problem. When we attempted to connect to the root\default namespace, we got the same error we observed from the wmi logs.

image

Basically this error code corresponds to (WBEM_E_NOT_FOUND) Object cannot be found. Why does SQL setup uses the WMI namespace root\default? Setup uses the StdRegProv class in WMI. The StdRegProv class provides EnumValues method to query values from registry. The StdRegProv class is available in the root\default namespace.

So the next step was to rebuild the corrupted WMI namespace. We worked with our Windows support team and used the following commands to rebuild the namespace:

In c:\windows\system32\wbem

"for /f %s in ('dir /b /s *.dll') do regsvr32 /s %s"

Then from the root of the drive, run

"for /f %s in ('dir /s /b *.mof *.mfl') do mofcomp %s"

After this the customer needed to perform a reboot and then we were able to connect to the WMI namespace, setup was able to enumerate all components and apply the service pack. If you are doing this procedure on your own, it would be a good idea to perform a system backup to make sure you can restore system components in case of a problem.

 During this investigation we also found out that in the recent Operating Systems, WMI logging is done in a much different way. For more details, refer to Tracing WMI Activity.

Thanks

Suresh B. Kandoth

Senior Escalation Engineer – SQL Server

 

Introducing the SQL Server 2008 R2 Best Practices Analyzer (BPA)…

$
0
0

Some of you may have noticed I haven’t posted a blog in some time. Well, I’ve been a bit busy working behind the scenes on a new tool we released this weekend, the SQL Server 2008 R2 Best Practices Analyzer (BPA). You may remember that I announced this new tool back in April at the PASS Europe Summit on this post.

This past week final development and testing were completed on the tool and it is now available for you to download at the following location:

http://www.microsoft.com/downloads/details.aspx?displaylang=en&FamilyID=0fd439d7-4bff-4df7-a52f-9a1be8725591

I will spend more time this summer posting a series of blogs on various aspects of the tool, the rules contained in it, and possible usages and benefits for you. While the tool was developed and produced by the product team, the CSS team was a major part of the project. The CSS SQL Escalation team provided much of the rule logic, design, and guidance about how the BPA tool would scan for known configuration settings or look for critical events to alert DBAs. In a way, running this tool is like getting CSS SQL Escalation advice without having to call CSS.

For now let me give you a few facts to help you get started:

  • Even though called SQL Server 2008 R2 BPA, it supports both SQL Server 2008 and SQL Server 2008 R2
  • Supports Windows Server 2003, Vista, Windows Server 2008, Windows Server 2008 R2, and Windows 7
  • Requires you to first install the Microsoft Baseline Configuration Analyzer 2.0 and Powershell 2.0 (only on WS03, Vista, or WS08).
  • Comes with about 130 rules including applicable rules from SQL Server 2005 BPA ported to the new tool plus many other new rules covering Engine, Security, Replication, Reporting Services, Analysis Services, Setup (Servicing), and SSIS.
  • Supports remote scanning through MBCA or through remote powershell (the help file explains all of this)

We have also created a series of KB articles that covers every rule in the tool with reference information in the article pointing back to the rule. The articles can be found as you use the help file or click on the More Information Link in the tool results.

However, you can also find our articles with some keyword searching on the internet:

Go to http://support.microsoft.com and search on “SQL Server 2008 R2 BPA”. You will find a bunch of results like this:

image

Here is an example of one of these articles that talks about a rule to check for a recent “clean” CHECKDB:

image

I invite you to download our new BPA tool and give it a spin. While formal feedback is done through Microsoft Connect (Use the “Send Feedback” link in the help file), I’m happy to take on comments and questions on this blog post once you download and try it out.

 

Thank you,

Bob Ward, Microsoft

Known issues installing SQL 2008 R2 BPA relating to Remoting

$
0
0

After getting through the Pre-Reqs for BPA (PowerShell 2.0, MBCA, .NET Framework), you may hit one of two scenarios when installing BPA.

In all of the cases of an install failure, you will see the following error:

image

“There is a problem with this Windows Installer package.  A program run as part of the setup did not finish as expected.  Contact your support personnel or package vendor.”

In your Application Event Log, for both of these scenarios, you will also see the following entry:

Log Name:      Application
Source:        MsiInstaller
Date:          6/10/2010 8:38:18 AM
Event ID:      11722
Task Category: None
Level:         Error
Keywords:      Classic
User:          <Username>
Computer:      <Machine name>
Description:
Product: Microsoft SQL Server 2008 R2 BPA -- Error 1722. There is a problem with this Windows Installer package. A program run as part of the setup did not finish as expected. Contact your support personnel or package vendor.  Action EnablePSRemoting, location: powershell.exe, command: -NoLogo -NoProfile -Command Enable-PSRemoting -force

Workgroup (aka Non-Domain) Machine:

In this scenario, the Enable-PSRemoting command should execute fine from a PowerShell prompt.  The actual error coming back from the PowerShell command within the Installer is “Access Denied”. 

To workaround the issue you can do the following:

  1. Open a command prompt with Administrative Privileges
  2. Change to the directory where the .msi file resides
  3. Type msiexec /i <MSI Name> SKIPCA=1
    1. MSI Name will either be SQL2008R2BPA_Setup32.msi or SQL2008R2BPA_Setup64.msi depending on your platform
  4. Once BPA is installed, open a PowerShell prompt with Administrative Privileges
  5. Execute the following commands
    1. Enable-PSRemoting
    2. winrm set winrm/config/winrs `@`{MaxShellsPerUser=`"10`"`}

This should allow BPA to be successfully installed in the workgroup scenario.

Kerberos Failure:

The second scenario is that you are failing with the above due to a Kerberos issue.  This particular issue could actually show up after you have installed BPA depending on how you have configured your environment.

The issue stems from the fact that the Windows Remoting Windows Service uses the Network Service account.  Windows Remoting also uses SOAP calls over HTTP and defaults to using Kerberos.  As a result, it will be using the HOST Service Principal Name (SPN) that is on the Machine Account as it is running under that context.  You may have an HTTP SPN that resides on a different account with that host name.  For example, if you are running an IIS Web Application such as SharePoint, or if you are using Reporting Services and the service account is set to a Domain User account instead of Network Service or Local System.  If your URL of your application matches the machine name, then your HTTP SPN will be the same.  That’s where this problem comes in.  WinRM will stop working at that point and give you a message similar to the following.

Set-WSManQuickConfig : WinRM cannot process the request. The following error occured while using Negotiate authentication: An unknown security error occurred.
Possible causes are:
  -The user name or password specified are invalid.
  -Kerberos is used when no authentication method and no user name are specified.
  -Kerberos accepts domain user names, but not local user names.
  -The Service Principal Name (SPN) for the remote computer name and port does not exist.
  -The client and remote computers are in different domains and there is no trust between the two domains.
After checking for the above issues, try the following:
  -Check the Event Viewer for events related to authentication.
  -Change the authentication method; add the destination computer to the WinRM TrustedHosts configuration setting or use HTTPS transport.
Note that computers in the TrustedHosts list might not be authenticated.
   -For more information about WinRM configuration, run the following command: winrm help config.
At line:50 char:33
+             Set-WSManQuickConfig <<<<  -force
    + CategoryInfo          : InvalidOperation: (:) [Set-WSManQuickConfig], InvalidOperationException
    + FullyQualifiedErrorId : WsManError,Microsoft.WSMan.Management.SetWSManQuickConfigCommand

You can get this type of error from WinRM for muliple reasons.  The one that we saw in our testing was the HTTP SPN scenario.

If you do have an HTTP SPN defined on a Domain Account that is using the name of your machine, you have some options.  First you can follow the steps mentioned above to get BPA installed.  The Enable-PSRemoting command will give you the above error.  You can temporarily remove the HTTP SPN to get remoting enabled and then re-add the HTTP SPN.

Once BPA is setup, you will still not be able to run BPA if you put the HTTP SPN back in place.  You will see the following when you attempt to perform a scan:

image

This will occur regardless of which component you try to scan.  It could be the Engine, Setup, RS, etc…

The only way to perform the scan successfully would be to temporarily remove the HTTP SPN again.  Run the scan, and then put the HTTP SPN back in place.  Another option, but one that will probably require further testing from your application’s end, would be to run the application under a Host Header and then your HTTP SPN would not include the machine name, allowing BPA to run with out issue.  It’s not ideal though.

Unfortunately, the above options for the Kerberos issues are not really all that great.  Hopefully this will not be that common.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

My Kerberos Checklist…

$
0
0

I’ve had numerous questions regarding Kerberos, both internally within Microsoft and with Customers.  It continues to be a complicated topic and the documentation that is out there can be less than straight forward.  Based on some previous items I’ve worked on, I wanted to share my experience in regards

Let me start by looking at two scenarios for reference.  One that is basic and the other that is complex.

image

image

As you’ll find, once we figure out how to configure the basic scenario, the complex scenario ends up being very similar.

Data Collection:

The first thing when you try to tackle a Kerberos issue is to understand your environment.  I find that a lot of the Kerberos issues that I troubleshoot all come down to gathering the right information to make an informed analysis and identify the problem point.  The following data points relate to all servers involved.  We will circle back on the Client after we talk about the Servers.

  1. Know your topology
  2. What is the Service Account being used for the application in question?
  3. What Service Principal Name (SPN) does your app require?
  4. What SPNs are defined for that service account?
  5. What are the delegation settings for the service account?
  6. Local Policy related information
  7. Additional application specific information

 

Consistent vs. Intermittent Kerberos Issues

The data collection points above should allow you to get Kerberos working in most cases.  I say most cases because the above refers specifically to configuration.  I typically break it down to consistent vs. intermittent issues.  If the issue is reproducable every time, it is a configuration issue.  If it is intermittent, then it is usually not a configuration issue.  If it was it would happen all the time.  Intermittent means it works most of the time.  In order to work at all, it has to be configured correctly.  The exception to this would be if you are in a Farm type situation and the configuration is not the same on every box in the farm.  Sometimes you may hit Server A which is configured properly, and another time you may hit Server B which is not and causes an error.  Which brings us to the first Data Collection Point…

Know your topology

Before you being, you should know what servers are involved in your application as a whole.  If we are talking about a single web application, you probably have at least two servers to consider and know about – the Web Server and the Backend (SQL for our purposes).  They both play a part.  This becomes even more important in a distributed environment where you may have 3+ servers.

As you’ll see, with the data collection items, we basically will walk the line down your servers to check them one by one.

What is the Service Account?

For the particular server you are looking at, what is the service account that the application is using?  This is important, because this will tell us where the SPN needs to go.  It also plays a part in Delegation.  Not every service will be a Windows Service, so this could be dependent on the application in question.  Here are some examples:

SharePoint

IIS – not a windows service

image

Reporting Services

Windows Service

image

SQL Server

Windows Service

image

For windows services, you can also look in the Services MMC to get the information.  Again, you need to know what your application is doing:

image

What SPN does your app require?

We can look at all sorts of SPN listings, but before you do, we need to know what it is we are looking for.  I think this is one of the more complicated parts of Kerb configuration because the SPN is dependent on the application you are using.  The format of the SPN is consistent between applications, but what is required is dependent on the application, or from an SPN point of view, the service.  It is a Service Principal Name after all!

The SPN has the following format:  <service>/<host>:<port/name>

The port/name piece of this is optional and dependent on what the service will accept.

HTTP – For a default configuration, the port is never used for an HTTP SPN.  SPN’s are unique and if you add an HTTP SPN with a port on it, it will be ignored as it is not correct.  IIS and Internet Explorer do not affix the port number to the SPN request when they look for it.  From an Internet Explorer perspective, you can alter this behavior via a registry key to where it will, but I have yet to see anyone do that.  Most people aren’t aware of it from what I can tell.  From my experience, I would stay away from adding a port to an HTTP SPN.

MSSQLSvc – you can look at the following blog post to read more about how SQL determines the SPN needed.  http://blogs.msdn.com/b/psssql/archive/2010/03/09/what-spn-do-i-use-and-how-does-it-get-there.aspx

For the next couple of items, we will use the SharePoint service as the example – spservice.  In this case it is a web application, so we know it will use the HTTP service from an SPN perspective.  The host piece is dependent on how we are connecting to the web server.  This is true for any application really.  From an HTTP perspective it is the URL, for SQL it is the connection string.  Another thing to know is that both IIS and SQL will resolve a NetBIOS name to the Fully Qualified Domain Name if it can.  For example – http://passp will be resolved to passsp.pass.local.

For our spservice example with a url of http://passsp, our SPN turns out to be http/passsp.pass.local and it is placed on the spservice account.

Another special note about HTTP SPNs.  If for example my SharePoint AppPool (service) was using Network Service, this is considered the machine context so the SPN would go on the machine account (PASSSP).  However, HTTP is considered a covered service for a special service type called HOST.  Every Machine account has a HOST entry for the FQDN as well as the NetBIOS name.  You don’t need to add an HTTP SPN on the machine account as long as your URL matches the machine name.

When adding an SPN, I also always recommend that you add both the FQDN SPN (i.e. http/passsp.pass.local) as well as the NetBIOS SPN (i.e. http/passsp).  The NetBIOS SPN is a safety measure in case the DNS resolution fails and it just submits the NetBIOS SPN request.

What SPN is defined?

Now that we know the service account and what our SPN should be, we can look at the SPNs that are defined on that account.  We can use SetSPN to do this, although there are other tools that can help get this information for you (ADSIEdit, LDAP queries, etc…).  SetSPN is nice though as it ships with the Operating System starting with Windows 2008.  Lets have a look at our SharePoint Service account – spservice:

image

Based on what we came up with above, we can see that the passsp SPN’s are in place.  You’ll also notice another SPN present, which means this Service Account is hosting two HTTP Services (could be two AppPools on the one server, or on two separate servers). 

You could run into a situation where the SPN is defined on another account as well.  This may be a misplaced or a duplicate SPN.  Both will cause an issue for you.  Usually when I grab SPN information from an environment, I grab all SPN’s defined in the Domain so that I can look for misplaced or duplicate SPNs.  The SetSPN tool that comes with Windows 2008 and later (and can be downloaded for Windows 2003), contains a new switch that will look for Duplicates for you.  It is the –X switch.

image

In the above, you can see two accounts that had the http/passsp.pass.local SPN.  You can then decide which one really needs to be there based on the Service Account being used. 

What are the delegation settings?

Delegation only comes into play if you want the Client’s Windows credentials forwarded to another service.  For example, SharePoint to Reporting Services, Reporting Services to SQL, or even SQL to SQL in a Linked Server scenario.  NTLM does not allow for the forwarding of credentials.  This is accomplished through the process of delegation as part of the Kerberos Protocol. There are two main types of Delegation – Full Trust or Constrained Delegation.  Of note, you will not see the Delegation Tab on the Account within Active Directory unless an SPN has been assigned to that account.

Full Trust

This means that the given service can forward the Client’s credentials to any service.  You are non-discriminate in who you communicate to.  This is less secure option out of the two, but it is the easiest to configure out of the two (which I would expect being less secure – Secure always means complicated right?)

image

Constrained Delegation

Constrained means that you are going to specify which services you can actually delegate to.  The services are represented by SPN’s.  This is the more secure approach but has some drawbacks.  As mentioned before it is more complicated. The reason is that you have to know exactly what your application is trying to delegate to.  It may not be just the service you are interested.  For example, you may be configuring SharePoint for Delegation to go to Reporting Services, but then realize that you just broke a connection to SQL or maybe a connection to some web service that you are trying to hit that requires Kerberos.  It’s not really that bad as long as you understand everything that your application is going to reach out to and that would require passing on the Client’s credentials.

The other drawback to Constrained Delegation is that you lose the ability to cross a domain boundary.  Meaning a cross domain scenario will fail from a delegation perspective.  Users from another Domain can hit your application, but all of the services that you are communicating to need to be in the same domain.  For example, SharePoint (Domain A) cannot delegate to SQL (Domain B).  Under constrained delegation, that will fail.

In the image below, the 3rd radio dial means that you want to use Constrained Delegation.  The sub radio dials define whether you want to use all Kerberos, or if you want to enable Protocol Transitioning.  I’m not going to get into Protocol Transitioning in this blog post as it is big enough, but you will have to deal with Protocol Transitioning if you are using the Claims to Windows Token Service.  This would come into the picture if you are doing anything with Excel Services in SharePoint or PowerPivot.

image

 

You will need to go back to your application’s topology to determine if enabling delegation is required. If we look at our Double Hop example from above, Reporting Services would need to have delegation enabled for it’s service account, but SQL would not as SQL isn’t going out to anything using the Client’s credentials.

Local Policy Settings

There is at least one Local Policy setting you’ll need to pay attention to when trying to delegate.  That is the “Impersonate a client after authentication” policy.

image

If your middle server is a web server, you can take advantage of a build in group that has this permission.  For Windows 2003, the group is called IIS_WPG.  For Windows 2008 and later it is the IIS_USRS group.  By default, SharePoint and RS should place itself in that group.  So, you usually don’t have to worry about it.  I’m just mentioning it here as a step in the checklist.  I rarely see this as the issue though unless you are doing a customer application with a Domain User account for the service account.

Client

Let’s circle back on the Client.  You may be asking, all this is great for the application, but is there anything special I need to do for the User Account coming from the client.  Not really.  By default you should be good to go from the Client’s user account.  However, there is an account you should be aware of within Active Directory.  That is the “Account is sensitive and cannot be delegated” setting.  If that is checked, you will have issues with that specific user.  To this date, I have yet to see a customer actually have that checked.  Doesn’t mean people don’t do it.  I just haven’t seen it.

image

Application Specific Settings

When I started getting into Kerberos, I found that almost all of the issues were based on the Active Directory settings (SPN, Delegation, etc…).  Not to say that that has lessened, but I’ve also seen a shift in the complexity of getting specific applications up and running.  As applications become more complex, you should be aware of what settings may come into play within that app that could affect Kerberos.  If you have gone through everything above and it all looks good.  Chances are that there is an application specific setting that is interfering. 

There is a lot to mention in this area, so I will spin up another blog post to discuss application specific settings to touch on IIS, SharePoint, Excel Services, PowerPivot and Reporting Services.  SQL doesn’t really have any Kerb specific settings as long as the SPN and delegation settings (if needed) are in place.

Tying it together…

So, we’ve looked at what my checklist is, but it was really focused on one service. What I’ve found is that it is as simple as that.  All I do is repeat the check list on each server that play a part in the application (topology).  Think of it as a wash, rinse, repeat.  When I help customers to get Kerberos configured, I just walk the line down each server to make sure everything lines up.  I have been fairly successful with that approach.  As I’ve had more experience with it (as I usually deal with it every day), I can usually target a specific segment depending on where the error is coming from.  Other times it may not be that straight forward.  Even when I target a specific area, if that doesn’t pan out, I just start from the beginning and apply the checklist to each server/service that is playing a part. 

Once you approach it that way, it really doesn’t matter how many hops there are or what services are involved.  You just follow the checklist one more time.  The point where complications usually come into play are when Constrained Delegation is implemented and we didn’t account for everything or you hit up against an App Specific issue.  Outside of that, it is usually straight forward based on the above.  Just find out what the SPN needs to be and where it needs to go and you are 80% there.

I realize I’m making it sound simple when it can be very frustrating and complicated, but the above has worked well for me in the past. Hopefully the above is helpful to you as you try to implement Kerberos within your environment. 

There is definitely way more to cover on this topic and I will continue to blog about those items.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

Revisiting an old SSRS performance topic again…

$
0
0

Almost three years ago, I posted about why Reporting Services takes so long to respond to the first request if it has been sitting idle for a long time (like overnight).  The original post can be found at http://blogs.msdn.com/b/sqlblog/archive/2007/11/09/reporting-services-staaaarrrrrtttt-up.aspx.  For those of you who don’t want to read it, the core conclusion was that (at least on my hardware), SSRS was taking about 30 seconds to start up after the application domain had been recycled.  The post was in response to the question I was getting (and still get) about why SSRS takes so long to start up.  The answer (for those of you who don’t want to bother reading the old article), was that SSRS has to do a ton of work to load all of the required dependencies, plus read the configuration files and decrypt the data in the catalog database.

I recently set up a new installation of SSRS 2008 R2, so I thought I would do the same comparison.  I had figured to see similar results, so you can imagine my surprise when the numbers averaged about 15 seconds!!!  I attribute some of that to a better machine, but my primary machine is not too much better than my server machine was back in those days (quad-core vs. dual-core, but similar RAM and clock speeds).  In fact, some quick benchmarks confirmed my thoughts – the new machine is better at higher workloads, but at the relatively low level of work I am doing for these tests, they are about the same.

Here is the updated code:

 

    Sub Main(ByVal args() As String)
        Dim sr As New IO.StreamWriter(IO.Path.Combine(Environment.CurrentDirectory, "ReportingServicesMetrics.txt"))

        'let's loop through 5 times
        For i As Integer = 0 To 4
            Dim ts As DateTime
            Dim te As DateTime

            If args(0).ToUpper = "IIS" Then
                'let's reset IIS to make sure everything has to start from scratch
                sr.WriteLine("Doing an IISReset - Loop #" & (i + 1).ToString)
                Dim proc As New Diagnostics.Process
                proc.StartInfo.FileName = "iisreset"
                proc.Start()
                proc.WaitForExit()
                sr.WriteLine("IIS has been reset")

                'dummy web service
                sr.WriteLine("Getting ready to instantiate the dummy web service that just returns an empty DataSet")
                ts = Now()
                Dim dws As New DummyWebService.Service
                dws.ReturnBlankDataSet()
                sr.WriteLine("Instantiated the dummy web service")
                te = Now()
                sr.WriteLine("The dummy web service took " & te.Subtract(ts).TotalMilliseconds.ToString)
            End If

            If args(0).ToUpper = "SSRS" Then
                sr.WriteLine("Doing an SSRS restart - Loop #" & (i + 1).ToString)
                Dim srv As New System.ServiceProcess.ServiceController("SQL Server Reporting Services (R2)")
                srv.Stop()
                Do While srv.Status <> ServiceProcess.ServiceControllerStatus.Stopped
                    Console.WriteLine(srv.Status)
                    Threading.Thread.Sleep(5000)
                    srv.Refresh()
                Loop
                srv.Start()
                Do While srv.Status <> ServiceProcess.ServiceControllerStatus.Running
                    Console.WriteLine(srv.Status)
                    Threading.Thread.Sleep(5000)
                    srv.Refresh()
                Loop
                sr.WriteLine("SSRS has been reset")

                'RS
                sr.WriteLine("Getting ready to instantiate the RS web service")
                ts = Now()
                Dim rs As New RS2005.ReportingService2005
                Dim creds As New Net.NetworkCredential
                creds = Net.CredentialCache.DefaultCredentials
                rs.Credentials = creds
                rs.Url = "http://localhost/reportserver_r2/reportservice2005.asmx"
                rs.ListChildren("/", False)
                sr.WriteLine("Instantiated the RS web service")
                te = Now()
                sr.WriteLine("RS took " & te.Subtract(ts).TotalMilliseconds.ToString)
            End If

        Next
        sr.Close()
    End Sub

In summary, even though I would love to see some improvement in this area, my own testing shows that SSRS 2008 R2 starts up from a cold start in about half the time is took SSRS 2005.

Your mileage and results will vary depending on your hardware configuration, so don’t take my results to use in a proposal to your management.  If you do, I will disavow any knowledge of the above testing!!

Do your own testing, but please share your results.  I would love to hear if other people see similar startup improvements between the two versions.


Why doesn't SQL Server use statistics and index on this computed column?

$
0
0

In this post,  I talked about how to use computed column to improve performance.   By creating index on a computed columns, you can have two benefits.   You get better cardinlaity esimate on the expression in your query and your query may also use that index to do seeks or scans.

Lately, I have been helping a customer.  They have a view which does an aggregate (group by) like this select  ISNULL(c1,'0') + ISNULL (c2, '0') as compare_str, COUNT (*) from t group by ISNULL(c1,'0') + ISNULL (c2, '0')

Then the view is joined with many other tables.   The problem is that the expression  ISNULL(c1,'0') + ISNULL (c2, '0')  ends up producing distinct results.   What this means is that the group by doesn't reduce number of rows by that part of the query.    But since there is not statistics (because is a dynmically computed  column),  the estimate is much lower than actual rows.   So it produced a very poor plan.

Naturally, I wanted to help by creating a persisted computed column and adding an index to it.  To my surprise, I discovered that SQL Server still doesn't use the index or statistics from that index.  In other words, SQL Server continues to do a table scan and continue to estimate low number of rows for that particular aggregate.

It turns out that ISNULL is the problem.  After examining the table structure,   some columns involved in the computed expression are non-nullable but ISNULL is used on these columns.  When a column is not nullable, ISNULL is not really necessary.  So optimizer simplfied the input 'tree'.   But the computed column I created continues to use the expression that has ISNULL.  So optimizer can't do a match.  therefore, it can't use index or statistics on the computed columns.

So the solution is this:   if your expression involves ISNULL, make sure you don't apply to the column that is non-nullable.   if you follow this rule, index and statistics of the computed column can be used.

Let me demonstrate this by the following example:

 

--setting up data

use tempdb
go
drop table t
go
create table t (c1 nvarchar(20) not null, c2 nvarchar(20) null, c3 nvarchar(20))
go

set nocount on

declare @i int
set @i = 0
begin tran
while @i < 100000
begin
 declare @ch nvarchar(max) = cast(@i as nvarchar(max))
 insert into t (c1, c2) values (@ch, 'test')
 set @i += 1
end
commit tran

go
set statistics profile on
go
--note that this query has bad CE for the aggrgate  (red)
select  ISNULL(c1,'0') + ISNULL (c2, '0') as compare_str, COUNT (*) from t group by ISNULL(c1,'0') + ISNULL (c2, '0')

/*
 EstimateRows  Rows                 Executes             StmtText                                                                                                                   
 ------------- -------------------- -------------------- ---------------------------------------------------------------------------------------------------------------------------
 316.2278      100000               1                    select  ISNULL(c1,'0') + ISNULL (c2, '0') as compare_str, COUNT (*) from t group by ISNULL(c1,'0') + ISNULL (c2, '0')      
 316.2278        0                    0                      |--Compute Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1008],0)))                                                
 316.2278      100000               1                           |--Hash Match(Aggregate, HASH:([Expr1004]), RESIDUAL:([Expr1004] = [Expr1004]) DEFINE:([Expr1008]=COUNT(*)))        
 100000                0                    0                                |--Compute Scalar(DEFINE:([Expr1004]=[tempdb].[dbo].[t].[c1]+isnull([tempdb].[dbo].[t].[c2],N'0')))            
 100000        100000               1                                     |--Table Scan(OBJECT:([tempdb].[dbo].[t]))                                                                
*/


go
set statistics profile off

go
alter table t add compare_str as ISNULL(c1,'0') + ISNULL (c2, '0') persisted
go
create index ix1 on t (compare_str)
go
--note that this query continues to have incorrect estimate even after creating a computed column to match the expression and an index on that computed column
set statistics profile on
go
select  ISNULL(c1,'0') + ISNULL (c2, '0') as compare_str, COUNT (*) from t group by ISNULL(c1,'0') + ISNULL (c2, '0')
go
set statistics profile off
/*
EstimateRows  Rows                 Executes             StmtText                                                                                                                
------------- -------------------- -------------------- -------------------------------------------------------------------------------------------------------------------------
316.2278      100000               1                    select  ISNULL(c1,'0') + ISNULL (c2, '0') as compare_str, COUNT (*) from t group by ISNULL(c1,'0') + ISNULL (c2, '0')   
316.2278      0                    0                      |--Compute Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1008],0)))                                             
316.2278      100000               1                           |--Hash Match(Aggregate, HASH:([Expr1004]), RESIDUAL:([Expr1004] = [Expr1004]) DEFINE:([Expr1008]=COUNT(*)))     
100000               0                    0                                |--Compute Scalar(DEFINE:([Expr1004]=[tempdb].[dbo].[t].[c1]+isnull([tempdb].[dbo].[t].[c2],N'0')))         
100000        100000               1                                     |--Table Scan(OBJECT:([tempdb].[dbo].[t]))                                                             
*/


--now let's change the expression a bit  (not using ISNULL for non-nullable column) when creating the computed column

drop index t.ix1
go
alter table t drop column compare_str
go

alter table t add compare_str2  as c1 + ISNULL (c2, '0') persisted
go
create index ix2 on t (compare_str2)

go

--this time, the computed column is used
--and cardinality estimate is accurate and the index is used
set statistics profile on
go
select  ISNULL(c1,'0') + ISNULL (c2, '0') as compare_str, COUNT (*) from t group by ISNULL(c1,'0') + ISNULL (c2, '0')
go
set statistics profile off

/*
 EstimateRows  Rows                 Executes             StmtText                                                                                                                
 ------------- -------------------- -------------------- -------------------------------------------------------------------------------------------------------------------------
 100000        100000               1                    select  ISNULL(c1,'0') + ISNULL (c2, '0') as compare_str, COUNT (*) from t group by ISNULL(c1,'0') + ISNULL (c2, '0')   
 100000        0                    0                      |--Compute Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1008],0)))                                             
 100000        100000               1                           |--Stream Aggregate(GROUP BY:([Expr1004]) DEFINE:([Expr1008]=Count(*)))                                          
 100000              0                    0                                |--Compute Scalar(DEFINE:([Expr1004]=[tempdb].[dbo].[t].[compare_str2]))                                    
 100000        100000               1                                     |--Index Scan(OBJECT:([tempdb].[dbo].[t].[ix2]), ORDERED FORWARD)                                      
*/

 

Jack Li  |  Senior Escalation Engineer  | Microsoft SQL Server Support

Why is my SQL Clustered Instance changing authentication modes?

$
0
0

We get our fair share of cases related to SQL Server running ( or not running) on a Windows Cluster.  I had one of them recently where the customer was seeing different authentication modes for SQL Server depending on which node of the 2-node cluster that it was online on. The Errorlogs document this behavior clearly as follows

We see that Sql came online on node P1 it was in Mixed Mode

2010-05-08 05:17:41.13 Server      Authentication mode is MIXED.
2010-05-08 05:17:41.70 spid4s      The NETBIOS name of the local node that is running the server is 'P1'.
2010-05-08 23:26:05.69 spid4s      SQL Trace was stopped due to server shutdown. Trace ID = '1'.

Afer a failover Sql came online on node P2 in Windows Authentication mode

2010-05-08 23:26:52.82 Server      Authentication mode is WINDOWS-ONLY.
2010-05-08 23:26:53.60 spid5s      The NETBIOS name of the local node that is running the server is 'P2'.
2010-05-09 10:16:18.55 spid5s      SQL Trace was stopped due to server shutdown. Trace ID = '1'

Finally when they fail back to P1, SQL came online in Mixed Mode

2010-05-09 10:16:34.50 Server      Authentication mode is MIXED.
2010-05-09 10:16:45.42 spid4s      The NETBIOS name of the local node that is running the server is 'P1’

The only recent change to their SQL Server environment was that they applied Service Pack 3 a couple of weeks back but they had not tested failover till this weekend – which is when they ran into this issue.

The behavior clearly indicates that there is something amiss between the two nodes of the cluster. I had just started to compare the registry keys between the nodes, when my colleague Adam Saxton suggested that this might be a problem with checkpoint not getting applied.

After some additional research I found that Service Pack 3 relies on the checkpoint to be applied to the passive node on the first failover- hence making it a non -issue in most cases. In this case I needed to first determine whether the checkpoint file was even there to be applied to the passive node or not.

Ok so let us see how to determine where the checkpoint files for SQL Server instance are located in a cluster. We start by first finding the GUID under HKEY_LOCAL_MACHINE\Cluster\Resources that corresponds to our SQL Server instance i.e. where the name=SQL Server

image[6]

Now expand the GUID folder and click on the RegSync key under this GUID and you should see seven keys  in it from 00000001 thru 00000007 as shown below.

clip_image004

If you are curious as to where these files actually reside, they are under the Quorum drive in the folder Q:\MSCS\<SQL Server GUID>

clip_image006

In this customer’s case we only had one key :- 00000007. Since the customer was clearly missing the checkpoint files, we had to recreate them by following these steps:-

a. Bring SQL Server online on working node with Mixed mode authentication

b. Take SQL Server resource group offline from within Cluster Administrator

c. Open regedit and add the following keys under HKEY_LOCAL_MACHINE\Cluster\Resources\<SQL Server GUID>\RegSync ( via Right click à New String Value in the right hand pane)

Value name Value Data
---------  -----------
00000001   Software\Microsoft\Microsoft SQL Server\MSSQL.1\Replication
00000002   Software\Microsoft\Microsoft SQL Server\MSSQL.1\SQLserverAgent
00000003   Software\Microsoft\Microsoft SQL Server\MSSQL.1\Cluster
00000004   Software\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLSERVER
00000005   Software\Microsoft\Microsoft SQL Server\MSSQL.1\PROVIDERS
00000006   Software\Microsoft\Microsoft SQL Server\MSSQL.1\SQLServerSCP

d. Bring SQL Server resource group online from within Cluster Administrator

At this time you should see some new .CPT files created under Q:\MSCS\<GUID>\ indicating that the checkpointing is now working as expected. The next step is to attempt failover to the passive node, during which the checkpoint file shall get applied there and then SQL Server shall come up with Mixed Mode authentication on it as well.


Rohit Nayak | Microsoft SQL Server Support

Sampling can produce less accurate statistics if the data is not evenly distributed

$
0
0

 

Recently I worked with a very knowledgable customer who called in and wanting to know things about statistics.  This is because he noticed that his query would get inaccurate cardinality estimate due to ‘inaccurate histogram.    Specifically, he has questioned why AVG_RANGE_ROWS would be very high when he did 10% sampling.  But it became very low (almost distinct) when he used 100% sampling.

In order to illustrate the issue, let me create a fake table and populate data using this script:

create database dbStats
go
go
alter database dbstats set recovery simple
go
use dbStats
go
create table t(c1 uniqueidentifier)

go
set nocount on
begin tran
declare @i int = 0
declare @id1 uniqueidentifier = newid()
while @i< 8000000
begin

    declare @id uniqueidentifier
    if @i % 100 = 0
        set @id = NEWID()
    insert into t values (NEWID())
    insert into t values (@id)
    if @i < 2000000
        insert into t values (@id1)
    set @i +=1
    if (@i % 100000 = 0)
    begin
        commit tran

Technorati Tags:


        begin tran
    end
end
commit tran
go

create index ix on t(c1)
go

 

If you update the statistics with 10% sampling, you will get histogram 1 (below).  But if you update statistics with 100% sampling, you will get histogram 2.  Note that one major difference is that AVG_RANGE_ROWs are much higher in histogram 1 than in histogram 2.

In fact, as you increase sampling rate from 10% to a larger number, the AVG_RANGE_ROWS  will gradually decrease.  First of all, AVG_RANGE_ROWS basically means for any value within a histogram step, how many duplicates are there.   If you have a value of 2 for AVG_RANGE_ROWS, it means for any given value within the histogram step,  it will have 2 duplicates.   SQL Server optimizer uses this to do cardinality estimate for the values falling within a histogram step.

In order to explain what’s going on, let’s take a look at data first.   The data is constructed in a way that is not evenly distributed.   The column has 8 million  distinct values that only appear once.  There is one value that appears  2 million times.  Then there are 80,000 values that appear 100 times within the table.

So overall, the data is very selective.  Out of 18 million rows, there are more than  distinct 8 million values.   The customer’s argument is that for any given value that falls within a histogram, SQL really should estimate less than 2 rows.  AVG_RANGE_ROWS should be less than 2 rows.

When you do 100% sampling, the AVG_RANGE_ROWS is 1.8 or 2.  So it’s accurate.  But with 10% sampling, most AVG_RANGE_ROWS is 18 (much higher).

The reason is that the data is not evenly distributed.   If the data is truly evenly distributed, 10% sampling and 100% sampling will produce similar results.   But if some values appear way more than other values, sampling produce less accurate results.  This is because the more frequent values will have higher chance of being selected to compute statistics. At final stage, the values are scaled up to produce the ‘inflated’ statistics.   This eventually ends up with a statistics histogram that tells SQL Server that data is less selective than really is.

What’s the solution?

There are a couple of things. If you can afford fullscan (100%) or increasing sampling, do that.   If you can’t, you may have to rely in index hints for some queries.

 

Histogram 1 (10% sampling)

RANGE_HI_KEY                         RANGE_ROWS    EQ_ROWS       DISTINCT_RANGE_ROWS  AVG_RANGE_ROWS
------------------------------------ ------------- ------------- -------------------- --------------
ADA7D301-B912-4927-BC84-0000730876EE 0             1             0                    1
1D4D7A35-9E7B-46E7-BF87-01B486B2166B 113919.4      1005.627      6442                 17.6839
2F22F6DE-A893-4826-A8B8-04E5A31C5819 187018.4      1005.627      11932                15.67406
2602AD30-5365-4DE6-BEFB-07E79CB221F0 193601.4      1005.627      11077                17.47708
ADC703C1-2CD6-4107-9177-09268149C7C6 73038.45      1005.627      4597                 15.88914
968BE242-C378-4FDA-9A6F-0B883FCBAB52 169844.2      1005.627      9031                 18.80737
87B47694-B775-4414-B638-103D45D87C84 303451.8      1005.627      17569                17.27227
7DE3FC0A-2A74-4C90-B0E3-12177F27B7F2 110587.5      1005.627      6711                 16.47944
7B9DB839-7F96-46A6-821D-130BC4345A90 66637.23      1005.627      3668                 18.16596
BFC1684B-EB83-4D20-B0DF-150805AC5A8E 109729.3      1005.627      7341                 14.9474

Histogram 2 (100% sampling)

RANGE_HI_KEY                         RANGE_ROWS    EQ_ROWS       DISTINCT_RANGE_ROWS  AVG_RANGE_ROWS
------------------------------------ ------------- ------------- -------------------- --------------
7972667E-CC57-42F4-A414-00000190A98E 0             1             0                    1
CFF7C764-41AA-4E89-B802-00AA81D1D512 39129         100           21012                1.862222
473C80D5-A9CC-4186-B4B3-093CCCEAD064 538744        100           270949               1.988359
61E3899D-74CD-4090-9E76-0A031A54EFD6 44973         100           24480                1.837132
E857ABD0-DA6D-4367-92C7-1D98186AF634 1224175       100           617701               1.981825
8863C50F-160D-4DD3-A1C4-1E24D9DE79D3 31413         100           17454                1.799759
3AF1D87E-805E-49B6-9870-301658472C85 1109208       100           565698               1.960778
20BEE350-A654-43F3-BB84-30A978964057 33231         100           17985                1.847706
443AC49E-FE4B-4E32-9FF9-47A667FF1E0F 1451375       100           725309               2.001044
5BAFE28B-6AF7-42D8-BB9E-48C3D1FB3F7B 67121         100           35342                1.899185
FE43EE54-4D31-4CCF-B231-4D6E02B3F852 295111        100           147502               2.000726
8EC2F5D1-50BC-4452-8F9A-4DFD046D3E5B 32713         100           17566                1.862291
C636084E-7EC5-48B6-A273-5451C9EE8269 393640        100           199600               1.972144
75DE7AB9-EEE9-4426-A907-54E4F00BA1C1 33538         100           18292                1.833479
5F5A4E68-3573-4EB3-8ABC-5ACA7ABE7C73 361499        100           185873               1.944871
48BC6F49-D712-4B7B-A91D-5BC822179B2F 65751         100           31497                2.087532
B22C40FE-1F99-47F8-872D-ACFB95246E52 5078911       100           2564212              1.980691

 

Jack Li |  Senior Escalation Engineer | Microsoft SQL Server Support

Why does PREEMPTIVE_OS_GETPROCADDRESS Show a Large Accumulation?

$
0
0

There is a bug in SQL Server 2008 that causes PREEMPTIVE_OS_GETPROCADDRESS to include and accumulate the execution time of the extended stored procedure (XPROC). The following is an example showing the increase in the GetProcAddress wait time.

select * from sys.dm_os_wait_stats where wait_type = 'PREEMPTIVE_OS_GETPROCADDRESS' or wait_type = 'MSQL_XP'
exec master..xp_dirtree
'f:\'
select * from sys.dm_os_wait_stats where wait_type = 'PREEMPTIVE_OS_GETPROCADDRESS' or wait_type = 'MSQL_XP'

GetProcAddress is used to load the entrypoint in the DLL (XPROC) and should complete quickly but due to the accumulation bug the wait time is inflated.   To get a better idea (ballpark) of how long GetProcAddress really takes you can using the following query.

declare @WaitTime bigint
select @WaitTime = wait_time_ms from sys.dm_os_wait_stats where wait_type =
'MSQL_XP'
select @WaitTime - wait_time_ms from sys.dm_os_wait_stats where wait_type = 'PREEMPTIVE_OS_GETPROCADDRESS'

Bob Dorr - Principal SQL Server Escalation Engineer

Installing SQL Integration Services after SQL Cluster Setup has Completed

$
0
0

Today I ran into an issue where, SQL Server 2008 SP1 was installed on a Windows 2008 cluster and was working just fine, but we wanted to install SQL Server Integration Services (SSIS) to the two nodes of the cluster. Since SSIS is not cluster aware, we thought it was be just a simple process of adding features to an existing instance of SQL Server. Unfortunately, it was not that intuitive.

In setup’s SQL Server Installation Center you naturally select “New SQL Server stand-alone installation or add features to an existing installation”, because you already have a installed instance of SQL Server on the cluster.

image

In the “Installation Type” step of the setup, if you select the second ratio button to “Add features to an existing instance of SQL Server” is when you run into problems in this situation.

 image

If you select the “Integration Services” option only during the “Feature Selection” step, then proceed with the SSIS installation you will encounter the following error if you already have a clustered SQL instance:

---------------------------
Rule "Existing clustered or cluster-prepared instance" failed.

The instance selected for installation is already installed and clustered on computer SQLClustInstName. To continue, select a different instance to cluster.
---------------------------
OK  
---------------------------

Of course, you don’t want to install another cluster to your instance, but here is the key: you do want to “Perform a new installation of SQL Server 2008”, which is the first radio button in the Installation Type step of the setup (screen shot above). This will allow you to select any feature you desire to put on the system in the Feature Selection step:

image

Here you would only select “Integration Services” (unfortunately my screen shot shows I have SSIS already installed). By just choosing “Integration Services” you can successfully install SSIS on each node of the cluster. You must run this setup on all nodes of the cluster where you want the SSIS service installed.

Note: Don’t forget to alter you <ServerName>.\SQL2008</ServerName> property in the MsDtsSrvr.xml file to correctly point to your SQL virtual server name rather than the default “.\InstanceName” that is currently there.

Have a great day!

Eric Burgess

SQL Server Escalation Services

SQL 2008 FileStream Fails to Enable After Setup on Cluster that uses Veritas Mountpoints

$
0
0

Recently a customer ran into an issue where they had successfully installed SQL Server 2008 SP1 on a 2 node Windows 2008 cluster. When they went to restore a database that was given to them, they found out the database was created with the new SQL 2008 FileStream feature and couldn’t restore the database on their newly installed SQL 2008 SP1 clustered instance. So they went to enable FileStream through the SQL Configuration Manager. After clicking the check box “Enable FILESTREAM for Transact-SQL access” and “Enable FILESTREAM for file I/O streaming access”, then attempting to apply their selections they encountered a message in the dialog.

image

The message appears below the last checkbox (my system doesn’t have the error, but I wanted to show the dialog). The actual message:

"A previous filestream configuration attempt was incomplete. Filestream may be in an inconsistent state until re-configured"

In addition, if you look in the Application Event log you will see an Access Violation was raised:

Faulting application wmiprvse.exe, version 6.0.6002.18005, time stamp 0x49e0274f, faulting module CLUSAPI.dll, version 6.0.6001.18000, time stamp 0x4791acce, exception code 0xc0000005, fault offset 0x000000000001df73, process id 0xca4, application start time 0x01cb28f4bcfeed65.

Cause

This is a bug in the SQL Server 2008 post-setup FileStream Enablement code. We assume the cluster resources are physical disks and not mount points. This causes a NULL value to be returned to the hResult, which we then pass to the Cluster API which causes an AV. We have only seen this problem when using Veritas Storage Foundation (in this case 5.1 SP1 of VSF) mountpoints.

Workaround

1. Uninstall and Reinstall SQL Server 2008 SP1 and enable FileStream as part of the setup. It will succeed. Make sure you backup all system and user databases before uninstalling and reinstalling SQL Server.

2. Change storage so that it doesn’t use Veritas mountpoints

3. This problem will be fixed in CU10 for SQL 2008 SP1 due out in September 2010. This will allow FileStream enablement post-setup when using mountpoints.

Thanks

Eric Burgess

SQL Server Escalation Services.

How It Works: Error 18056 - The client was unable to reuse a session with SPID ##, which had been reset for connection pooling

$
0
0

This message has come across my desk a couple of times in the last week and when that happens I like to produce blog content.  

The error is when you are trying to use a pooled connection and the reset of the connection state encounters an error.   Additional details are often logged in the SQL Server error log but the 'failure ID' is the key to understanding where to go next.

Event ID:           18056

Description:     The client was unable to reuse a session with SPID 157, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

Map the failure ID to the following (SQL 2008 and SQL 2008 R2 failure id states)

 

        Default = 1,

        GetLogin1,                    2

        UnprotectMem1,                3

        UnprotectMem2,                4

        GetLogin2,                    5

        LoginType,                    6

        LoginDisabled,                7

        PasswordNotMatch,             8

        BadPassword,                  9

        BadResult,                    10

        CheckSrvAccess1,              11

        CheckSrvAccess2,              12

 

        LoginSrvPaused,                  13

        LoginType,                       14

        LoginSwitchDb,                   15

        LoginSessDb,                     16            

        LoginSessLang,                   17

        LoginChangePwd,                  18

        LoginUnprotectMem,               19

 

        RedoLoginTrace,                  20

        RedoLoginPause,                  21

        RedoLoginInitSec,                22

        RedoLoginAccessCheck,            23

        RedoLoginSwitchDb,               24

        RedoLoginUserInst,               25

        RedoLoginAttachDb,               26

        RedoLoginSessDb,                 27     

        RedoLoginSessLang,               28

        RedoLoginException,              29             (Kind of generic but you can use dm_os_ring_buffers to help track down the source and perhaps -y)

 

        ReauthLoginTrace,                30

        ReauthLoginPause,                31

        ReauthLoginInitSec,              32

        ReauthLoginAccessCheck,          33

        ReauthLoginSwitchDb,             34

        ReauthLoginException,            35

                           Login assignments from master

        LoginSessDb_GetDbNameAndSetItemDomain,           36

        LoginSessDb_IsNonShareLoginAllowed,              37

        LoginSessDb_UseDbExplicit,                       38

        LoginSessDb_GetDbNameFromPath,                   39

        LoginSessDb_UseDbImplicit,                       40      (I can cause this by changing the default database for the login at the server)

        LoginSessDb_StoreDbColl,                         41

        LoginSessDb_SameDbColl,                          42

        LoginSessDb_SendLogShippingEnvChange,            43

 

                                Connection string values

 

        RedoLoginSessDb_GetDbNameAndSetItemDomain,       44

        RedoLoginSessDb_IsNonShareLoginAllowed,          45

        RedoLoginSessDb_UseDbExplicit,                   46      (Data specificed in the connection string Database=XYX no longer exists)

        RedoLoginSessDb_GetDbNameFromPath,               47

        RedoLoginSessDb_UseDbImplicit,                   48

        RedoLoginSessDb_StoreDbColl,                     49

        RedoLoginSessDb_SameDbColl,                      50

        RedoLoginSessDb_SendLogShippingEnvChange,        51  

  

                                Common Windows API calls

 

        ImpersonateClient,                            52

        RevertToSelf,                                 53

        GetTokenInfo,                                 54

        DuplicateToken,                               55

        RetryProcessToken,                            56

        inChangePwdErr,                               57

        WinAuthOnlyErr,                               58

 

Error: 18056  Severity: 20  State: 46.

The client was unable to reuse a session with SPID 1971  which had been reset for connection pooling. The failure ID is 46. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

State 46 = x_elfRedoLoginSessDb_UseDbExplicit = 0n46

 

There is only one place in the code (We are simply trying to execute a usedb and getting a failure.) that sets this state and it is after we have printed the message 4060 to the client that we could not open the database or the user does not have permissions to the database.    Since there are not messages about a database going offline or being recovered and this connection as already established – “Would there have been any permission changes at this time to prevent this login from accessing the database?”   

 

I tried this with a test application.

 

Connection pool using database dbTest

User RDORRTest with default database dbTest

 

When I drop the user in the database dbTest the client starts getting the errors as I expected to see.

 

07/28/10 07:56:45.391 [0x00001E5C] SQLState: 28000, Native Error: 18456 [Microsoft][SQL Server Native Client 10.0][SQL Server]Login failed for user 'RDORRTest'.

07/28/10 07:56:45.410 [0x00001E5C] SQLState: 42000, Native Error: 4064 [Microsoft][SQL Server Native Client 10.0][SQL Server]Cannot open user default database. Login failed.

 

My SQL Server error log shows

 

2010-07-28 08:02:40.41 Logon       Error: 18456, Severity: 14, State: 50.

2010-07-28 08:02:40.41 Logon       Login failed for user 'RDORRTest'. Reason: Current collation did not match the database's collation during connection reset.

2010-07-28 08:02:40.41 spid53      Error: 18056, Severity: 20, State: 50.

2010-07-28 08:02:40.41 spid53      The client was unable to reuse a session with SPID 53, which had been reset for connection pooling. The failure ID is 50. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

 

I password change for the login at the server generated state 8.

If I rename the database I don’t get any information about the rename in the error log and I start getting connection failures.

 

All my attempts to far had been when the login was setup with a default database.  However, to get to the 46 condition I had to specify the DATABASE for the connection string.

 

Now all I had to do was drop the user from the database and I get state 46.

 

2010-07-28 08:29:51.61 Logon       Error: 18456, Severity: 14, State: 46.

2010-07-28 08:29:51.61 Logon       Login failed for user 'RDORRTest'. Reason: Fa iled to open the database configured in the login object while revalidating the login on the connection. [CLIENT: 65.53.66.207]

 

Added the user back and I no longer get the error and the connections continue their work.

 

Bob Dorr - Principal SQL Server Escalation Engineer


When a full dump isn’t really a full dump…

$
0
0

I was working on a customer issue which involved debugging a dump.  The dump was generated via SQLDumper within Reporting Services.  So, the name of the dump was similar to SQLDmpr0001.mdmp.  When I opened the dump I saw the following:

Loading Dump File [C:\temp\SQLDmpr0001.mdmp]
User Mini Dump File with Full Memory: Only application data is available

Which tells me we actually have a full dump.  Well, that and the fact that the dump was almost 8GB.

Through the course of debugging, I had a need to run !handle to get some handle information.  This is what the output should look like for a specific handle:

0:000> !handle 50 f
Handle 0000000000000050
  Type             Event
  Attributes       0
  GrantedAccess    0x1f0003:
         Delete,ReadControl,WriteDac,WriteOwner,Synch
         QueryState,ModifyState
  HandleCount      2
  PointerCount     4
  Name             <none>
  No object specific information available

However, this is what I got when I tried running it on this particular dump:

0:050> !handle 510 f
ERROR: !handle: extension exception 0x80004002.
    "Unable to read handle information"

So, maybe that individual handle was bad, but running !handle also had the same issue.

0:050> !handle
ERROR: !handle: extension exception 0x80004002.
    "Unable to read handle information"

To be honest, this is the first time I’ve had to look at handle information in an RS Dump as most of the time I’m looking at the managed side of things, not the native side.  My thought at that point was that the dump collection didn’t actually grab handle related information.  This dump was collected with the following setting within rsreportserver.config

<!--  <Add Key="WatsonFlags" Value="0x0430" /> full dump-->
<!--  <Add Key="WatsonFlags" Value="0x0428" /> minidump -->
<!--  <Add Key="WatsonFlags" Value="0x0002" /> no dump-->
<Add Key="WatsonFlags" Value="0x0430"/>

430 showed full dump from the rsreportserver.config, but lets see what this actually means from sqldumper.

C:\Program Files\Microsoft SQL Server\100\Shared>sqldumper /?
Usage: sqldumper [ProcessID [ThreadId [Flags[:MiniDumpFlags] [SqlInfoPtr [DumpDir [ExceptionRecordPtr [ContextPtr [ExtraFile]]]]]]]] [-I<InstanceName>] [-S<ServiceName>][-remoteservers:[print|dump|freeze|resume|remote:guid\dumporigin\signature\localId\port\operationType]]
  Flags:
    dbgbreak            = 0x0001
    nominidump          = 0x0002
    validate_image      = 0x0004
    referenced_memory   = 0x0008
    all_memory          = 0x0010
    dump_all_threads    = 0x0020
    match_file_name     = 0x0040
    no_longer_used_flag = 0x0080
    verbose             = 0x0100
    wait_at_exit        = 0x0200
    send_to_watson      = 0x0400
    defaultflags        = 0x0800
    maximumdump         = 0x1000
    mini_and_maxdump    = 0x2000
    force_send_to_watson= 0x4000
    full_filtered_dump  = 0x8000

  MiniDumpFlags:
    Normal                           = 0x0000
    WithDataSegs                     = 0x0001
    WithFullMemory                   = 0x0002
    WithHandleData                   = 0x0004
    FilterMemory                     = 0x0008
    ScanMemory                       = 0x0010
    WithUnloadedModules              = 0x0020
    WithIndirectlyReferencedMemory   = 0x0040
    FilterModulePaths                = 0x0080
    WithProcessThreadData            = 0x0100
    WithPrivateReadWriteMemory       = 0x0200
    WithoutOptionalData              = 0x0400
    WithFullMemoryInfo               = 0x0800
    WithThreadInfo                   = 0x1000

So, that gets me all memory, dump all threads and send to watson.  Apparently, all_memory doesn’t include the handle table.  As a test I ran SQLDumper with the flag 0x1430, with 0x1000 being maximumdump.  The full dump collected with that, got me the handle information I was looking for.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

Replay Result Set Event (Replay * Events)

$
0
0

From: Robert Dorr
Sent: Friday, August 13, 2010 8:58 AM
Subject: RE: SQL Server Trace Replay - "Replay Result Set Event"

 

The Result Set event is one of several client side replay events.

 

Here are some of Common Result Event Columns
Text

Returned values for example

 

DECLARE @edition sysname; SET @edition = cast(SERVERPROPERTY(N'EDITION') as sysname); select case when @edition = N'SQL Azure' then 1 else 0 end as 'IsCloud'

-    RETURNS -

 

IsCloud as column name in result set event 

 

IsCloud = 0 as value in result row event

 

SPID

Session Id from the ORIGINAL Trace

IntegerData

SPID used during REPLAY

LoginName

Database User

DBID

Database Id

DBNAME

Database Name

Handle

Handle of prepare / cursor if used

BatchId

Mars Batch Id

 

Sent: Friday, August 13, 2010 5:41 AM
Subject: SQL Server Trace Replay - "Replay Result Set Event"

 

Does anyone has a description what I get back in details in the SQL Trace "Replay Result Set Event"? A description of the returned columns would be fine, especially I am interested in the columns “IntegerData” - what does this retuned number mean?

Bob Dorr -  Principal SQL Server Escalation Engineer

How It Works: Enumeration of sys.messages

$
0
0

I ran into an issue which has some aspects I found interesting and thought you might as well.

When you do a select against the sys.messages virtual table the data is retrieved from the resource files (.RLL) files stored under <<Instance>>\Binn\Resource directory.   This is done by loading the RLL library and retrieving the resource string information then materializing the row in the result set.

What I found interesting is that by default SQL Server will materialize a row for each of the installed resource languages.  For example the following select will return 11 rows …

select * from sys.messages where message_id = 605

image

… matching the 11 installed resource RLLs.

image

Internally all entries from syslanguages are searched for.   On my SQL Server 2008 installation syslanguages DMV holds 33 rows.   So the SQL Server searches for 22 more rows/message which don't have matching RLL files.

To make your sys.messages queries faster add the 'language_id = ####' predicate to the where clause.   SQL Server will then only look for the specific language and the query performance will be significantly increased.

select

* from sys.messages where language_id = 1033

Bob Dorr - Principal SQL Server Escalation Engineer

How It Works: Timer Outputs in SQL Server 2008 R2 - Invariant TSC

$
0
0

 

Stopwatch

I would love nothing more than to take you back to my high-school days running the 440 yard hurdles (yes yards not meters) where timing was done with a stop watch (you know the old, moving dial style) but timers on the PC don't allow that simplicity.

I have discussed the timing behavior is SQL Server in previous blogs and it is time to discuss timing behaviors again as SQL Server 2008 R2 has been updated.

How It Works: SQL Server Timings and Timer Output (GetTickCount, timeGetTime, QueryPerformanceCounter, RDTSC, …): http://blogs.msdn.com/b/psssql/archive/2009/05/29/how-it-works-sql-server-timings-and-timer-output-gettickcount-timegettime-queryperformancecounter-rdtsc.aspx

SQL Server 2005 - RDTSC Truths and Myths Discussed: http://blogs.msdn.com/b/psssql/archive/2007/08/19/sql-server-2005-rdtsc-truths-and-myths-discussed.aspx 

I normally don't reference material outside the Microsoft website but this link is one of the best I have found that briefly explains the entire history of PC timers.  Click Here

If you are like me it is a bit confusing as to what timer you are looking at and Windows Server 2008 R2 and SQL Server 2008 R2 took steps to eliminate the confusion once and for all.

SQL Server 2008 R2 attempts to detect when the system supports the invariant-TSC counter.   Newer processors are being designed to tick the RDTSC value at a constant rate no matter what the power settings or state might be.  This means there is a high resolution counter, with constant tick rate available to the system.   The invariant TSC does not exhibit the issues I have outlined about the variant TSC (SQL Server 2005 RTM) in previous blog posts.

In Windows 2008 R2 (Windows 7 and newer OS Versions*) the QueryPerformanceCounter may be based off the invariant TSC counter making a call to QueryPerformanceCounter lightweight and accurate.   Windows does not have an API to indicate what source QueryPerformanceCounter is using at this time so SQL Server 2008 R2 startup times the invocation of QueryPerformanceCounter (10 times) and when it exhibits repeated, small cycle behavior (< 600 cycles) the QueryPerformanceCounter will be used to accept invariant TSC timings instead of the GetTickCount/timeBeginPeriod interrupt timing behavior.   If the timing exceeds 600 cycles the multi-media timer behavior of SQL Server is used instead.

Note:  Windows is exposing the timer source API for QueryPerformanceCounter in a Windows 2008 R2 based fix so the timing source determination by SQL Server will be replaced with the API call in a future build.  Target releases are the SP1 of Windows 2008 R2 and SQL Server 2008 R2.

When I express the previous information to others I get inquisitive looks with the 'what does this mean to me?' question and rightfully so.  What is means is that Windows 2008 R2 and SQL Server 2008 R2 try to select the counter that will be accurate with the best precision.   If the invariant TSC is not present then the interrupt timer (Multi-Media) at a granularity of 1ms instead of micro-seconds will be used.   When the invariant TSC is available accurate, micro-second timings are possible.

image

Using a query against sys.dm_os_sys_info you and see the timer source in use by the instance of SQL Server.   Here is an example from a SQL Server 2008 R2 instance I am running.  If the Query Performance Counter is not used the value will be MULTIMEDIA_TIMER and a time source value of 1 (one).

The following table refractors the table from the previous blog post to include the SQL Server 2008 R2 and Windows 2008 R2 behaviors.

Operating System
(All Platforms)
SQL Version / Timer
Windows 2003
Windows 2003 R2
Windows 2008
SQL 2000 Standard Interrupt Timer Granularity ~12ms (GetTickCount)
SQL 2005 RDTSC
SQL 2005 SP3 Multi-Media
SQL 2008 Multi-Media**
SQL 2008 R2 Multi-Media**
Windows 2008 R2
SQL 2008 Multi-Media**
SQL 2008 R2
1 QueryPerformanceCounter
< 600 cycles
Use QueryPerformanceCounter
2 QueryPerformanceCounter
> 600 cycles
Multi-Media Timer

Call timeBeginPeriod to establish the smallest interrupt tick granularity (generally 1 ms).   The dm_exec_*_stats* DMVs are based on interrupt timings as well as GetDate and other functionality.   Trace *:Completed events are always a combination of GetDate and QueryPerformanceCounter functionality.
3 QueryPerformanceCounter
> 600 cycles
and -T8049
Force Multi-Media Timer
4 QueryPerformanceCounter
> 600 cycles
and -T8038
SQL 2000 behavior

Uses default GetTickCount granularity with remaining behavior as described in row #2

 * If QueryPerformanceFrequency returns 0 the QueryPerformanceCounter will NOT be used.  (Treated like > 600 cycles)

 ** Selection of Multi-Media timer can be altered using trace flags 8049 or 8038 as described in the SQL Server 2008 R2 section above.

Older systems showing QUERY_PERFORMANCE_COUNTER as the timer source on Windows 2008 R2

The invariant TSC is implemented in newer CPUs but it is possible for Windows 2008 R2 to detect that the CPU is not registered to drop into deep sleep states.  When this is the case the TSC can be considered invariant and Windows will use the RDTSC as a valid source for QueryPerformanceCounter timings.   

The QueryPerformanceCounter source is determined at boot time so when the SQL Server service starts the timer source has been established.   The only time this is not the rule is when running on a machine that can switch from AC to DC power (a laptop).  When this occurs the power scheme can change and allow deep sleep states.  The QueryPerformanceCounter source will dynamically change to a safer timer (Hpet, PMTimer, …) but users of the QueryPerformanceCounter API won't be directly impacted as the proper QueryPerformanceFrequency and such metric outputs are encapsulated by the API.   The only side effect may be a slight decrease in performance when calling the QueryPerformanceCounter API.

Mutli-Media Timer in a Virtual Machine

When you are running SQL Server in a virtual machine (Hyper-V, VMWare, …) a call to QueryPerformanceCounter vs the interrupt timer (Multi-media timer: GetTickCount, timeGetTime, …) can yield different results.   I found this working with a customer showing entries into sys.dm_exec_procedure_stats for last_elapsed_time were noticeably smaller than the RPC:Completed event.    There are several reasons for the variance but one of the reasons is that the interrupt timer is virtualized (can queue and lag) and the QueryPerformanceCounter is generally working with the hardware in a more direct way.

I built a console application that would grab the GetTickCount (timeGetTime), QueryPerformanceCounter and GetSystemTime on regular intervals.   Under heavy CPU stress the interrupt based timer (GetTickCount, …) would lag ~30ms behind the reported QueryPerformanceCounter timings.   Some of this is noise due to common context switching and some could be the virtual interrupt queuing.

Here is my snippet from the response I sent to the customer.

 

·         sys.dm_exec_query_stats only shows statement level events that are cached and not all statements ever run on the server.  (Ex: create index will not appear in the DMV)

·         sys.dm_exec_procedure_stats shows execution time of the procedure but does not include the batch cleanup time, time for streaming the final OUTPUT parameters, result status and such activity.  Larger output parameters could influence the delta between the last_elapsed_time and the RPC:Completed event.

·         Interrupt timing may not be completely accurate in a virtual environment

·         Profiler duration is closer to the time it takes to submit and complete the command (including the cleanup activity, output parameter streaming and such activity).  The RPC and Batch completed events are really what you want to see in a DMV but we don’t have that DMV.  You want a sys.dm_exec_batch_stats that shows arrival to final result send duration and statistics.

·         The *worker* time values in the stats DMVs is the potential SOS scheduler time and can be loosely equated to CPU but that is not always the case and it does not directly align to some of the other CPU outputs that come from GetThreadTimes API output

·         The Profiler timer is QueryPerformanceCounter based and the stats DMVs are commonly interrupt timer based.   SQL Server 2008 R2 may use QPC for stats tables if the system supports an invariant RDTSC timer.  

·         GetDate uses a 1 sec sync with the OS and inside that second the interrupt timer movement is used to quickly calculate the offset within the second.   

Bob Dorr - Principal SQL Server Escalation Engineer

Why use SQL Server 2008 R2 BPA? Case 1: Missing Updates…..

$
0
0

In June I introduced you to a new Best Practices Analyzer for SQL Server, SQL Server 2008 R2 BPA:

http://blogs.msdn.com/b/psssql/archive/2010/06/20/introducing-the-sql-server-2008-r2-best-practices-analyzer-bpa.aspx

I’ve seen some mixed reaction to this tool. But when I verbally talk to some about the type of knowledge CSS has put into the rules, they see a greater value. Therefore, I thought I would put together a series of blog posts with some examples of this knowledge.

The first is something I call “Missing Updates”. While Microsoft Update does provide a great service to our customers, one thing it does not do is proactively provide advice to customers on updates that might affect their operation of our product based on the knowledge of CSS customer experiences.

Consider the following scenario. You upgrade to SQL Server 2008 and now a query that ran fine in SQL Server 2005 fails with the following error:

Msg 605, Level 21, State 3, Line 1
Attempt to fetch logical page (1:225) in database 2 failed. It belongs to allocation unit 281474980315136 not to 504403158513025024.

Yikes. Corruption in tempdb? You are not sure exactly the problem is and never run CHECKDB in tempdb (don’t blame you I wouldn’t either). You decide to go ahead and run CHECKDB in tempdb but alas no errors found.

You decide to do a Bing search for “605 and tempdb” and this article is one of the top hits:

image

This looks promising. You see this problem is fixed in Cumulative Update 3 for SQL Server 2008 RTM and realize you do not have this build installed. Great. You found the answer and didn’t have to ask anyone for help. So you download this build, setup a time to install it during the evening, and the next morning you will be the hero as no one will complain anymore about these new 605 errors.

But when you come in the next morning you get all types of complaints about “this new 605 error is still happening”. Sigh. You thought it was resolved and had an article that explains the problem but the fix from Microsoft didn’t work. You place a call into Microsoft CSS to help you with the problem. The first question you get after describing the situation is “Did you turn on the trace flag?”. What trace flag? I never saw anything about a trace flag? You go back to this article and read the fine print.

image

So you still may be a hero after all but that was not fun to have to call Microsoft to learn you need a trace flag to enable this fix. You find out the trace flag was needed because the fix involved the Query Optimizer (QP) and it has become a standard practice for SQL Server to enable QP fixes only by a trace flag.

While you now have the solution, it sure would have been nice if:

  • I could have run something on my SQL Server 2008 server after upgrading to “let me know about important updates from CSS”
  • I could have run something when the original fix didn’t seem to work to “check if I did it right”

If you look carefully down towards the end of this KB article, you will find the answer to these wishes:

image

The SQL Server 2008 R2 BPA tool has a rule to help detect if you are running a version of SQL Server where this fix is not applied. But the rule just doesn’t check the version. It also ensures the trace flag has been applied.

Let’s say that on this server you run SQL Server 2008 R2 BPA after installing CU3 for SQL Server 2008 RTM but had not applied the trace flag. You might find a Warning from the tool like this:

image

Notice the comments under the Issue:

[ TF 4135 and 4199 not enabled. We also detected occurrences of Event ID 605 or 824 which might be related to the problem…. ]

Now look at the Resolution:

Resolution: Enable trace flag 4135 or 4199

Trace flag 4199 is a more “general” trace flag to enable multiple QP fixes so either one will work (more on that in later).

So BPA is giving you explicit instructions to enable the trace flag. If you didn’t have the actual version applied it would also tell you that If you select “More Information” you will be directed to the article you first discovered on the web, 960770, which ties this all back together. As a bonus, it also looks for any occurrences of 605 or 824 (824 can also be a symptom) involving tempdb in the event log. This just gives you a stronger tie that you are actually hitting this problem.

Behind the scenes is a fairly complex rule for checks of versions and trace flags. Why? Because for example trace flag 4199 is only applicable in certain versions (this trace flag was not available until a specific build of SQL Server 2008 and SQL Server 2008 R2) and we have to account for the proper versions of this fix for both SQL Server 2008 and SQL Server 2008 R2.

How did CSS come up with this a recommended fix for BPA? We actually polled our engineers internally and looked at customer case experiences. We found out that some customers didn’t know about the fix and even for those who did they almost always forgot the trace flag.

This is one example of how BPA can be used to help you proactively address common problems that come up during the maintenance and operation of SQL Server 2008 and SQL Server 2008 R2.

Our next case involves the mystery of the missing installer cache file. Stay tuned.

 

Bob Ward
Microsoft

Viewing all 339 articles
Browse latest View live




Latest Images