Quantcast
Channel: CSS SQL Server Engineers
Viewing all 339 articles
Browse latest View live

SETSPN -A with Windows 2012 does a duplicate check upfront

0
0

If you have followed my posts, or caught my sessions at PASS, you may have figured out that Kerberos is one of my strength areas.  I recently setup a Windows 2012 server to just see how SharePoint Integration with Reporting Services would work out. 

As I was doing that, I knew I would need the HTTP SPN configured for my SharePoint server.  As I created the SPN, I saw something very interesting.

image

The “Checking domain” piece made me assume that this was actually seeing if the SPN existed.  Basically checking to make sure this wouldn’t be a duplicate.  Then I decided to validate that assumption.

I have a bogus SPN sitting on my Claims Service account to allow me to setup delegation.  I’m going to use that for the test.  it is just “my/spn”

image

So, lets try adding that to another account.

image

That’s awesome!

I also found this documentation on TechNet discussing what is new with Kerberos in Windows 2012.

What's New in Kerberos Authentication (Windows 2012/Windows 8)
http://technet.microsoft.com/en-us/library/hh831747.aspx

Of note, this functionality actually existed within the Windows 2008/R2 SetSPN as the –S switch.  With the Windows 2012 version, –A just behaves the same as –S now.  Which is good.

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton


SharePoint Adventures : Reporting Services, Claims and One Way Trusts

0
0

I had an interesting case that was presented to me by our friends in the SharePoint support team.  The issue was that when they go to use a Data Source for Reporting Services that is set to use Windows Authentication, they saw the following:

image

Within the SharePoint ULS logs we see the following:

Throwing Microsoft.ReportingServices.Diagnostics.Utilities.ClaimsToWindowsTokenException: , Microsoft.ReportingServices.Diagnostics.Utilities.ClaimsToWindowsTokenException: Cannot convert claims identity to windows token. ---> System.InvalidOperationException: Could not retrieve a valid Windows identity. ---> System.ServiceModel.Security.SecurityAccessDeniedException: Access is denied.    Server stack trace:      at System.ServiceModel.Channels.ServiceChannel.ThrowIfFaultUnderstood(Message reply, MessageFault fault, String action, MessageVersion version, FaultConverter faultConverter)     at System.ServiceModel.Channels.ServiceChannel.HandleReply(ProxyOperationRuntime operation, ProxyRpc& rpc)     at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOpera...    25d0478e-1e06-4a1d-bf57-2f5a675805ad

Throwing Microsoft.ReportingServices.ReportProcessing.ReportProcessingException: , Microsoft.ReportingServices.ReportProcessing.ReportProcessingException: Cannot impersonate user for data source 'AdventureWorks2012.rsds'. ---> Microsoft.ReportingServices.Diagnostics.Utilities.ClaimsToWindowsTokenException: Cannot convert claims identity to windows token. ---> System.InvalidOperationException: Could not retrieve a valid Windows identity. ---> System.ServiceModel.Security.SecurityAccessDeniedException: Access is denied.    Server stack trace:      at System.ServiceModel.Channels.ServiceChannel.ThrowIfFaultUnderstood(Message reply, MessageFault fault, String action, MessageVersion version, FaultConverter faultConverter)     at System.ServiceModel.Channels.ServiceChannel.HandleReply(ProxyOpera...    25d0478e-1e06-4a1d-bf57-2f5a675805ad


Claims Authentication             bz7l    Medium      SPSecurityContext.WindowsIdentity: Could not retrieve a valid windows identity for NTName='CYLONS\numberone', UPN='NumberOne@cylons.local'. UPN is required when Kerberos constrained delegation is used. Exception: System.ServiceModel.Security.SecurityAccessDeniedException: Access is denied.    Server stack trace:      at System.ServiceModel.Channels.ServiceChannel.ThrowIfFaultUnderstood(Message reply, MessageFault fault, String action, MessageVersion version, FaultConverter faultConverter)     at System.ServiceModel.Channels.ServiceChannel.HandleReply(ProxyOperationRuntime operation, ProxyRpc& rpc)     at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)     at System.ServiceModel.C...    25d0478e-1e06-4a1d-bf57-2f5a675805ad

Of note, we have a One Way Transitive Trust setup between the BATTLESTAR domain, where the Windows Services reside, and CYLONS domain where the users reside.

My initial reaction to this was that this would not work.  My thinking was that because the Claims to Windows Token Services (C2WTS) would be making the call to Logon the User to get the Windows Credential and due to Constrained Delegation that would not end up being successful.

To help try and validate that, I enabled Kerberos Event Logging, did an IISReset and restarted the C2WTS Windows Service.  I then saw the following in the Event Log:

A Kerberos Error Message was received:
on logon session BATTLESTAR.LOCAL\claimsservice
Client Time:
Server Time: 17:59:46.0000 8/20/2012 Z
Error Code: 0x44 KDC_ERR_WRONG_REALM
Extended Error:
Client Realm: CYLONS.LOCAL
Client Name:
Server Realm: BATTLESTAR.LOCAL
Server Name: krbtgt/BATTLESTAR.LOCAL
Target Name: krbtgt/BATTLESTAR.LOCAL@BATTLESTAR.LOCAL
Error Text:
File: e
Line: 9fe
Error Data is in record data.

A Kerberos Error Message was received:
on logon session
Client Time:
Server Time: 17:59:46.0000 8/20/2012 Z
Error Code: 0x7  KDC_ERR_S_PRINCIPAL_UNKNOWN
Extended Error:
Client Realm:
Client Name:
Server Realm: BATTLESTAR.LOCAL
Server Name: krbtgt/CYLONS.LOCAL
Target Name: krbtgt/CYLONS.LOCAL@BATTLESTAR.LOCAL
Error Text:
File: 9
Line: f09
Error Data is in record data.

The KDC_ERR_WRONG_REALM indicates that we failed when trying to call the Client Realm (CYLONS.LOCAL).  Then the overall login failed with KDC_ERR_S_PRINCIPAL_UNKNOWN.

One question that came up was whether this would work with a Two Way Transitive Trust or not.  Based on my original assumption, my thought was no.  But I wanted to validate that as well.  So, I reconfigured my domains for a Two Way Transitive Trust and what do you know?  It worked!

image

So, this wasn’t a case of Constrained Delegation really.  It was just a Trust relationship issue, and for Kerberos to work properly in the this scenario, we have to have a Two Way Transitive Trust.

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

Unexpected ALTER DATABASE commands causing availability problems in Windows Azure SQL Database

0
0

Along with the expected challenges of the services world (rapidly changing features and quick adoption of new services), there are some unexpected ones.  One of the most recent ones I have dealt with involved some unexpected ALTER DATABASE operations coming through on customer’s Windows Azure SQL databases.  As it turns out, it was a side effect of having a beta portal associated with some production Windows Azure SQL databases.

It all started a couple of weeks ago when we had a customer complaining that some of their load tests were getting interrupted by failures like this:

8/2/2012 12:56:06 PM: ** The ALTER DATABASE command is in process.  Please wait at least five minutes before logging into database 'myDB', in order for the command to complete.  Some system catalogs may be out of date until the command completes.  If you have altered the database name, use the NEW database name for future activity.
Login failed for user 'myAdmin'.
This session has been assigned a tracing ID of 'c4a5942b-7141-446b-9227-bbf65ac00df8'.  Provide this tracing ID to customer support when you need assistance.

8/6/2012 1:36:22 AM: ** Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning()
   at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
   at System.Data.SqlClient.TdsParserStateObject.ReadSni(DbAsyncResult asyncResult, TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket()
   at System.Data.SqlClient.TdsParserStateObject.ReadByte()
   at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
   at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(DbAsyncResult result, String methodName, Boolean sendToPipe)
   at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()
   at System.Data.Linq.SqlClient.SqlProvider.Execute(Expression query, QueryInfo queryInfo, IObjectReaderFactory factory, Object[] parentArgs, Object[] userArgs, ICompiledSubQuery[] subQueries, Object lastResult)
   at System.Data.Linq.SqlClient.SqlProvider.ExecuteAll(Expression query, QueryInfo[] queryInfos, IObjectReaderFactory factory, Object[] userArguments, ICompiledSubQuery[] subQueries)
   at System.Data.Linq.SqlClient.SqlProvider.System.Data.Linq.Provider.IProvider.Execute(Expression query)

The mystifying thing was that the customer was very adamant about the fact that they weren’t issuing these ALTER DATABASE commands.  We dug into the service logs and verified that the ALTER commands weren’t issued from within the service as some sort of scheduled operation done by the infrastructure.  This matched my overall understanding of our infrastructure, so at this point we went back to the customer to try to figure out where in their test they were scripting the ALTER DATABASE command.  We suspected a script to some degree because the ALTER DATABASE command wasn’t changing any database properties.  It was just applying the properties that already existed on the database.

While still working with the customer on where in their load test they had scripted the ALTER DATABASE command, I heard from some of my colleagues that they also had customers reporting unexpected ALTER DATABASE commands.  Now, at this point, I began to strongly suspect something within the Azure world.    I didn’t have any strong proof because there didn’t seem to be any correlation between the customers reporting the behavior, but it was just too coincidental to not be related.

Next, we went back to our counterparts in the SQL Development team, but to no avail.  Nobody was aware of any scheduled activity that could trigger this.  If the SQL Development team wasn’t doing it, where else could it be coming from and still hitting multiple customers?!?!

Normally, I would start suspecting the customer was telling us what they thought was the truth (even if it wasn’t) but that we would find all of these customers had an ALTER DATABASE embedded in their deployment scripts somewhere.  I didn’t really believe it since we have never had this problem in all my years of supporting Windows Azure SQL Database, but I just couldn’t come up with any other explanation.  At this point, I have to admit that part of me was starting to think this was all a conspiracy to make me go crazy!

Then, we had a huge breakthrough in the investigation.  One of my colleagues was working with his customer and noticed that in the middle of the test the customer went out to the Windows Azure Preview Portal and increased the number of worker roles to handle the increased load.  Right after that change, he saw the ALTER DATABASE hit the customer’s server!  Being the suspicious type, my colleague repeated the exercise with his customer and saw the behavior again.  Aha – now we were on to something!

At this point, I strongly suspected that the customer had something in his OnStart() event for the Windows Azure role because (and sadly I quote myself here) “…there is absolutely no connection between Windows Azure role properties and Windows Azure SQL databases…”.  Luckily, my colleague who was working with this particular customer didn’t trust me (see, sometimes a little doubt is a good thing!) and he worked up his own Windows Azure deployment with linked resources (going on the assumption that linking them was the only way to make deployment somehow “aware”  of the database).  Lo and behold – updating the properties of his worker roles triggered an ALTER DATABASE!  (Thanks for doubting me, John! Smile)

At this point, we had a pretty good idea of what was happening.  The key combination is having a linked database resource and issuing a Windows Azure role property change command in the new Windows Azure Preview Portal.

Here’s how you set up a linked database.  First, go into your Windows Azure deployment from within the Preview Portal and then click on Linked Resources.  This will give you a dialog that looks like this:

image

Once you select a database, you will then see the database listed as a linked resource as shown below:

image

     

The core problem here is that the Save command triggers an execution of the ALTER DATABASE even if there aren’t any changes to the database properties!!!  In hindsight, I can see what someone was trying to do when they created this Save dialog.  They were trying to allow you to scale up on the fly inside the Windows Azure part of the dashboard without forcing you to go the the SQL Azure part of the dashboard to do the same.

The core problem is that the ALTER DATABASE command doesn’t inherently know there are no changes.  Because it takes a database level lock, the command ends up blocking any incoming connection attempts for the length of the ALTER.  If for some reason the ALTER takes a non-trivial amount of time (and it shouldn’t, but unfortunately does once in a while), it then causes failures for subsequent requests.

The solution to this is to only issue the ALTER DATABASE command if the database properties are changed and this is indeed what we are going to implement.  This change should be live in the new couple of weeks.  In the interim, if you run into this problem you can either refrain from making role changes via the Preview Portal (just use the original Windows Azure portal) or unlink your Windows Azure SQL database from your Windows Azure deployment.

Strange Sch-S / Sch-M Deadlock on Machines with 16 or More Schedulers

0
0

Since it took me several days to track down this bug, and I did learn a couple of new things along the way, I thought I would share some of my work.

16 or More CPUS

When a system presents SQL Server with 16 or more CPUs, and you are using a high end SQL Server SKU, SQL Server will enable lock partitioning.   (Lock partitioning can be disabled using startup parameter, trace flag -T1229.)

Lock Partitioning

Lock partitioning optimizes locking structures by adding additional, per scheduler structures and actions.   This design has similarities to Sub/Super Latching (http://blogs.msdn.com/b/psssql/archive/2009/01/28/hot-it-works-sql-server-superlatch-ing-sub-latches.aspx)

As a quick overview, if the query needs to obtain a Shared lock it only needs to acquire the shared lock on the local partition.  For an exclusive lock the query acquires the lock on each partition, always progressing from partition 0 to n to avoid deadlocks.   This allows the SQL Server to utilize the local partition when appropriate and improves scalability on larger systems.

Deadlock from Shared Lock on a Different Partition - What?

The problem I was presented with was the following deadlock output.   (This was from trace flag 1222 and 3605 to add deadlock information to the error log.  You could get similar information using the trace events.)

objectlock lockPartition=8 objid=1765581328 subresource=FULL dbid=8 objectname=Test id=lock47b821a00 mode=Sch-M associatedObjectId=1765581328

Notice the partition is 8 and the mode held is Sch-M.

owner-list

owner id=process46c276188 mode=Sch-M

The process is the task address that can be mapped to sys.dm_os_tasks, who owns the lock.

waiter-list

waiter id=process47b07dc38 mode=Sch-S requestType=wait

This is the close of the deadlock cycle by the second process.

Note: The waiter list is usually printed in ascending order based on how the victims will be selected; usually work investment based.
objectlock lockPartition=13 objid=1765581328 subresource=FULL dbid=8 objectname=Test id=lock47b821f80 mode=Sch-S associatedObjectId=1765581328 Partition 13 is showing the process that already holds the same Sch-S and is attempting a new acquire on partition 8.

owner-list

owner id=process47b07dc38 mode=Sch-S

Owner of the Sch-S lock.

waiter-list

waiter id=process46c276188 mode=Sch-M requestType=wait

Blocked process attempting to acquire the Sch-M lock.  This is expected as the Sch-M is attempting to acquire the lock on all partitions.

 

Under a rare condition SQL Server may not associate the proper lock partition with the lock request, leading to additional locking overhead or possible deadlocks.   This bug does not expose any locking problems that would lead to data integrity issues. This is a very small window during compile, before a user transaction is started.

The problem is that when using lock partitioning the Sch-S lock should be acquired on the transaction associated, local partition.  However, the same process is attempting to acquire the Sch-S lock on 2 different partitions leading to the deadlock.  Why?

  • The lock partition hint is stored with the connection object (sys.dm_exec_sessions - physical connection internal object to be more precise.)  
  • SQL Server assigns new batches to one of the active schedulers on the same NUMA node based on active task load for the schedulers.

In this case the login took place on scheduler 8 and the lock partition, hint is cached.  When the batch is processed it is assigned to scheduler 13 and the second partition becomes involved; triggering the unexpected behavior.

Bob Dorr - Principal SQL Server Escalation Engineer

Installing SQL Server on Windows 8

0
0

One of the things I often try to do is make sure I understand the experience of customers when installing new versions of SQL Server or existing versions of SQL Server on new Windows Operating Systems.

So I took a recent tour of the experience installing SQL Server on Windows 8. I thought you might benefit by reading through my thoughts about this experience before you do this yourself.

Let’s start with SQL Server 2005. First, it is not supported on Windows 8 but I wanted to see if we did a good job telling you this. When you run setup.exe for SQL Server 2005, you are presented with this dialog box

image

I selected the “Get help online’ option and was presented with this dialog box

image

OK, this is not bad. We say here that this version of SQL Server 2005 is not compatible with Windows. Unfortunately, we still allow you to run the setup program from the previous dialog box but at least there is some warning that this is not compatible.

Now for SQL Server 2008 R2 (this is the same experience as SQL 2008). First, let me stop and tell you the support compatibility story:

  • SQL Server 2005 and any previous version is NOT supported on Windows 8/Windows Server 2012
  • SQL Server 2008 is supported on Windows 8/Windows Server 2012 but requires a minimum of Service Pack 3
  • SQL Server 2008 R2 is supported on Windows 8/Windows Server 2012 but requires a minimum of Service Pack 1
  • SQL Server 2012 RTM is supported on Windows 8/Windows Server 2012

We have an article that talks about this and I’ll show you that article shortly.

On to the SQL Server 2008 R2 experience. Because I know SP1 is required, I expect to get a similar dialog box as SQL Server 2005  but perhaps with more information that I can proceed and then install SP1 afterwards.

image

Yes looks similar to SQL Server 2005. When I select “Get help online” I get this dialog box again similar to SQL Server 2005

image

This time I select the link that says “Tap or clock to go online….”. This will bring up the following KB article

image

 

This article lists the specific requirements for SQL Server versions that I’ve listed above in this post. And the article includes some of the screenshots I’ll show you in the rest of this blog post. Don’t mind the comment in the article about “Release Candidate..”. Since Windows 8 has released we will change that.

Back to the dialog box.. If you go to the lower right corner you will see a link “View all problems this applies to”. Select this and you will get a new window that looks like this (you might have multiple rows if you have runs setup more than once)

image

If you double-click on this entry you now presented with this window which provides more clear instructions that it is ok to proceed with the installation provided you install the necessary service packs afterwards depending on what release of SQL Server you are installing

image

One thing that may confuse you is that if you select “View Solution” on this page, you will be brought back to one of the original screens I’ve already shown you in this post. The reason for this is that these “Problems” are stored in your Action Center History so if you were to look at any of these Problems in the Action Center History you would see the right solution. Through the setup process you are in a way shown the solution before the problem.

If you select OK out of this screen and two previous ones you are left with this dialog box

image

Since there is no option for “Run the program I’ve already looked at the help”<g> the only option to pick is “Run the program without getting help”. You can now proceed with the installation of SQL Server. One other thing though that may cause some confusion. If you have never installed SQL Server on this machine, you may get the above dialog box several times before the Installation Center appears. This is because we may need to run our setup.exe several times and this is the program that is associated with the compatibility dialog box

One other related experience to installing SQL Server on Windows 8 is an in-place upgrade of Windows 8 from Windows 7 when SQL Server is already installed. I didn't actually go through the entire in-place upgrade experience myself completely. But I did want to see what Windows 8 upgrade would say if I had SQL Server already installed. My thanks to my colleague Robert Dorr for this one. He did an in-place upgrade of his laptop at work and was confused by the Compatibility Report message about SQL Server. He asked me to look into it and the result of that investigation follows.

I setup a VM running Windows 7 with SQL Server 2005, SQL Server 2008 RTM, and SQL Server 2008 R2 SP1 installed all side by side. I then chose to upgrade the VM with Windows 8.

At the very beginning of the install process, Windows 8 does a compatibility check for apps on your machine. For mine, it showed the following report

image

 

I understand the reason SQL Server 2005 is listed based on its unsupported status I’ve already talked about. Notice the recommendations for any of these application is to uninstall them before you continue. For SQL Server 2005, this may be a wise choice. As I’ve stated we don’t support SQL Server 2005 on Windows 8 and quite frankly we didn’t test what the affects might be to the upgrade. I can’t say your upgrade will have problems but all bets are off here. My recommendation is to uninstall it or upgrade it to a supported SQL Server release before continuing with the Windows upgrade.

The SQL Server 2008 and 2008 R2 listing is puzzling. I told you I have SQL 2008 R2 SP1 which is supported so why is it listed? This is because we bundled the message for compatibility together for both releases which is unfortunate. This means you will not be able to tell whether you have SQL Server 2008 or SQL Server 2008 R2 or if you have both, which one is causing the compatibility problem. But it does tell you one of these releases is not at a compatible version. The safest approach here is to stop the upgrade, and install the required supported service pack. However, I don’t know of any issue you would encounter by proceeding with the upgrade and then installing the required SQL service pack afterwards.

I hope these screenshots and side notes about the install experience on Windows 8 will help avoid any confusion for you and answer any questions you may have before you contact our Microsoft support teams.

There are some other interesting scenarios when installing with Windows 8/Windows Server 2012 which I’ll cover in my next blog post.

 

Bob Ward
Microsoft

How It Works: Online Index Rebuild - Can Cause Increased Fragmentation

0
0

SQL Server Books Online alludes to the fragmentation possibility but does not fully explain that the Online Index rebuilding may increase fragmentation when it is allowed to run with MAX DOP > 1 and ALLOW_PAGE_LOCKS = OFF directives.

The process of building an online index involves maintaining the active connection activity with that of the online building operation(s).   This is done by updating data modification plans to maintain both indexes during the online index building.

There are a few objectives an index rebuild can accomplish:

  • Generate new statistics
  • Change the number of pages used by the index (fill factor)
  • Reorganize the data on database pages in relative proximity to each other

ALTER INDEX MAXDOP Option

The MAXDOP option caps the number of workers which can participate in the alter action.  For example the following tells SQL Server to use no more than 8 workers during the alter index operation.  You can't force the maxdop it is only a suggested cap directive.

Alter Index all On test2 Rebuild With (Online = On, maxdop = 8)

 

MAXDOP = 1  (Serial)

The following FIGURE depicts a serialized alter index operation.   The new rowset maintains an allocation cache that is used when allocating the new pages to move data onto.   When the cache is empty 1 or more extents are allocated, as close to the last extent allocated as possible. 

 

image

This process allows the data to be packed onto pages near each other.  Reducing the number of pages, if the fill factor so indicates, and placing the rows in sorted order near one another.

MAXDOP > 1 (Parallel) using ALLOW_PAGE_LOCKS = OFF

When running in parallel a decision is made as to how the allocation cache will be utilized.  In the case of ALLOW_PAGE_LOCKS = OFF the logic is to share a single allocation cache. 

Take special note: The logic can use statistical operations to divide the workload among the workers.

image

 

This can lead to a leap frog style of allocation and increase fragmentation.   The pages of the index may be very contiguous allocations … 100, 101, 102, 103, … but the data on the pages is 100 (from 1st partition), 101 (from 2nd partition), 102 (1st partition) so when scanning the IAM in page order the page fragmentation level can climb.

image

Actions such as the fill factor adjustments and statistics gathering process as expected.

MAXDOP > 1 (Parallel) using ALLOW_PAGE_LOCKS = ON   (Default is ON for ALTER INDEX COMMAND)

When ALTER INDEX is able to use page or table (rowset level) locking the allocation patterns are optimized for bulk operations.  Without attempting to write a novel about how this works I have drawn a very high level picture in the figure shown below.

 

image

 

When bulk operations are enabled, an additional caching layer is instituted for each of the workers to use.   The Bulk Allocation Cache is sized based on the work load expected for the given partition, etc...   This allows each partition to allocate 1 or more extents at a time and then use those pages to store the data they are processing.   This provides a critical level of separation necessary to reduce the leap frogging effect and reduces fragmentation by at least a factor of 8 pages per extent.

Note: The fragmentation level will not be reduced as much as a MAXDOP=1 alteration, but it can reduce the fragmentation within percentage points of MAXDOP=1 in many instances.

Recap

  • MAXDOP is a key factor for determining the amount of work each worker is targeted to perform.
  • The type of allocation caching used determines the possible fragmentation impact
  • None of these options controls the fill factor maintenance
  • None of these options controls the statistics gathering

Bob Dorr - Principal SQL Server Escalation Engineer

AdomdConnectionException using PerformancePoint hitting Analysis Services

0
0

I was working with a customer who was encountering problems trying to use a PerformancePoint Dashboard against an Analysis Services Instance. The issue came down to the Claims to Windows Token Service (C2WTS) configuration.  This is used to take the Claims context and convert it to a Windows Token for use to backend servers.

When trying to create a Data Source within PerformancePoint Dashboard Designer, using the Unattended Service Account, the test succeeds.  If we switch that over to Per-user Identity, we see the following:

image

Within the Event Logs for the SharePoint App Server, we see the following from PerformancePoint:

Log Name:      Application
Source:        Microsoft-SharePoint Products-PerformancePoint Service
Date:          9/6/2012 11:59:57 AM
Event ID:      37
Task Category: PerformancePoint Services
Level:         Error
Keywords:     
User:          BATTLESTAR\spservice
Computer:      AdmAdama.battlestar.local
Description:
The following data source cannot be used because PerformancePoint Services is not configured correctly.

Data source location: http://admadama:82/Data Connections for PerformancePoint/5_.000
Data source name: New Data Source 3

Monitoring Service was unable to retrieve a Windows identity for "BATTLESTAR\asaxton".  Verify that the web application authentication provider in SharePoint Central Administration is the default windows Negotiate or Kerberos provider.  If the user does not have a valid active directory account the data source will need to be configured to use the unattended service account for the user to access this data.

Exception details:
System.InvalidOperationException: Could not retrieve a valid Windows identity. ---> System.ArgumentException: Token cannot be zero.
   at System.Security.Principal.WindowsIdentity.CreateFromToken(IntPtr userToken)
   at System.Security.Principal.WindowsIdentity..ctor(IntPtr userToken, String authType, Int32 isAuthenticated)
   at System.Security.Principal.WindowsIdentity..ctor(IntPtr userToken)
   at Microsoft.IdentityModel.WindowsTokenService.S4UClient.CallService(Func`2 contractOperation)
   at Microsoft.SharePoint.SPSecurityContext.GetWindowsIdentity()
   --- End of inner exception stack trace ---
   at Microsoft.SharePoint.SPSecurityContext.GetWindowsIdentity()
   at Microsoft.PerformancePoint.Scorecards.ServerCommon.ConnectionContextHelper.SetContext(ConnectionContext connectionContext, ICredentialProvider credentials)

This error is indicating that the C2WTS Service failed with getting the windows Credential.  The S4UClient call is the key indicator.  We reviewed the C2WTS settings, which aren’t many, and the one thing I remembered is that if you are using a Domain User account for the C2WTS Windows Service, you have to add it to the Local Adminstrators group on the box that is trying to invoke it.  In our case, it is the server hosting the PerformancePoint Service App.  You don’t have to do this step if you leave the C2WTS service as LocalSystem.

Once that is done, we need to recycle the C2WTS Windows Service and try it again.  We were then presented with a different error:

image

Log Name:      Application
Source:        Microsoft-SharePoint Products-PerformancePoint Service
Date:          9/6/2012 12:09:42 PM
Event ID:      9
Task Category: PerformancePoint Services
Level:         Warning
Keywords:     
User:          BATTLESTAR\spservice
Computer:      AdmAdama.battlestar.local
Description:
The user "BATTLESTAR\asaxton" does not have access to the following data source server.

Data source location: http://admadama:82/Data Connections for PerformancePoint/5_.000
Data source name: New Data Source 3
Server name: bspegasus\kjssas

Exception details:
Microsoft.AnalysisServices.AdomdClient.AdomdConnectionException: A connection cannot be made to redirector. Ensure that 'SQL Browser' service is running. ---> System.Net.Sockets.SocketException: The requested name is valid, but no data of the requested type was found
   at System.Net.Sockets.TcpClient..ctor(String hostname, Int32 port)
   at Microsoft.AnalysisServices.AdomdClient.XmlaClient.GetTcpClient(ConnectionInfo connectionInfo)
   --- End of inner exception stack trace ---
   at Microsoft.AnalysisServices.AdomdClient.XmlaClient.GetTcpClient(ConnectionInfo connectionInfo)
   at Microsoft.AnalysisServices.AdomdClient.XmlaClient.OpenTcpConnection(ConnectionInfo connectionInfo)
   at Microsoft.AnalysisServices.AdomdClient.XmlaClient.Connect(ConnectionInfo connectionInfo, Boolean beginSession)
   at Microsoft.AnalysisServices.AdomdClient.XmlaClient.GetInstancePort(ConnectionInfo connectionInfo)
   at Microsoft.AnalysisServices.AdomdClient.XmlaClient.GetTcpClient(ConnectionInfo connectionInfo)
   at Microsoft.AnalysisServices.AdomdClient.XmlaClient.OpenTcpConnection(ConnectionInfo connectionInfo)
   at Microsoft.AnalysisServices.AdomdClient.XmlaClient.Connect(ConnectionInfo connectionInfo, Boolean beginSession)
   at Microsoft.AnalysisServices.AdomdClient.AdomdConnection.XmlaClientProvider.Connect(Boolean toIXMLA)
   at Microsoft.AnalysisServices.AdomdClient.AdomdConnection.ConnectToXMLA(Boolean createSession, Boolean isHTTP)
   at Microsoft.AnalysisServices.AdomdClient.AdomdConnection.Open()
   at Microsoft.PerformancePoint.Scorecards.DataSourceProviders.AdomdConnectionPool`1.GetConnection(String connectionString, ConnectionContext connectionCtx, String effectiveUserName, CultureInfo culture, NewConnectionHandler newConnectionHandler, TestConnectionHandler testConnectionHandler)

At first I thought that this may be because of SQL Browser, based on the error message.  And, I know that for SQL Browser, when we have a Named Instance, you have to add the DISCO SPN’s per the following KB Article:

An SPN for the SQL Server Browser service is required when you establish a connection to a named instance of SQL Server Analysis Services or of SQL Server
http://support.microsoft.com/kb/950599

My thought was that I had to add delegation rights for the Claims and PerformancePoint service over to the DISCO service.  This actually turned out to not be needed at all based on my testing.  I have this actually working with those SPN’s in place and without the Claims/PerformancePoint service accounts having Constrained Delegation rights to that. 

After playing around with this a little more, I remembered that I had been told a while back that the Claims Service Account needs to have “Act as part of the operating system” right in order to work correctly.  My mindset was that if the account was a local admin, this wouldn’t be needed.  With that right missing, I was able to reproduce the 2nd error that the customer was hitting.  This is actually listed on page 126 of the following whitepaper. 

Configuring Kerberos Authentication for Microsoft SharePoint 2010 Products
http://www.microsoft.com/en-us/download/details.aspx?id=23176

Of note, the “Impersonate a client after authentication” right that it lists, you get for free because the Claims Service account will be a member of WSS_WPG which is a member of the IIS_IUSRS group because of SharePoint.

The C2WTS Service Account will be automatically added to the “Log on as a service” right when you start the C2WTS Service from Central Admin in the “Manage services on server” area.

The lesson learned here is that the Claims to Windows Token Service Account needs to be in the Local Administrators group and has to have the “Act as part of the operating system” right that you can assign within Local policies.

image

 

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

How It Works: SQL Server AlwaysOn Lease Timeout

0
0

The lease is used between the SQL Server resource DLL and the SQL Server instance to prevent split-brain from occurring for the availability group (AG).

The lease is a standard signaling mechanism between the SQL Server resource DLL and the SQL Server availability group.  The figure below depicts the general flow of the lease.image

The lease is only present on the primary replica, making sure the SQL Server and Windows cluster state for the availability group remain synchronized. 

The Windows cluster components poll to determine if the resource IsAlive or LooksAlive on regular intervals.  The resource dll must report the state of the resource to the Windows clustering components.  For those familiar with the older, SQL Server failover cluster instances (FCIs) this was the accomplished with generic query execution every ## of seconds to see of the server 'looks alive.' 

The new lease design removes all the connectivity components and problems associated with that additional overhead and provides a streamlined design to determine if the SQL Server 'looks alive.'  The resource dll and the SQL Server instance use the named memory objects, in shared memory, to communicate.    The objects are signaled and checked at regular intervals.  

The default signaling interval is 1/3 of the configured 'Health Check Timeout' of the availability group.

image

If the HealhCheckTimeout is exceeded without the signal exchange the lease is declared 'expired' and the SQL Server resource dll reports that the SQL Server availability group no longer 'looks alive' to the Windows cluster manager.   The cluster manager undertakes the configured corrective actions.  SQL Server prevents further data modifications (avoiding split-brain issues) on the current primary.  The cluster manager activity helps select the proper primary location and attempts to online the availability group.

The following is a sample message from the SQL Server error log when the lease has expired.

Error: 19407, Severity: 16, State: 1.

The lease between availability group 'MyAG' and the Windows Server Failover Cluster has expired. A connectivity issue occurred between the instance of SQL Server and the Windows Server Failover Cluster. To determine whether the availability group is failing over correctly, check the corresponding availability group resource in the Windows Server Failover Cluster.

 

AlwaysOn: The local replica of availability group 'MyAG' is going offline because either the lease expired or lease renewal failed. This is an informational message

Looking at the …\MSSQL\LOG\*DIAG*.XEL file for this issue you can see failures reported.  Notice the 'Resource Alive result: 0' - The SQL Server resource dll is going to report to the cluster manager that the availability group DOES NOT LOOK ALIVE.

Note:  SQL Server Management Studio may adjust time values based on your clients time zone settings.

image

The matching cluster log can output similar information as well.  Notice you have to adjust for UTC time.

000015ec.00002a64::2012/09/06-05:34:56.019 INFO  [RES] SQL Server Availability Group: [hadrag] SQL Server component 'query_processing' health state has been changed from 'warning' to 'clean' at 2012-09-06 06:34:56.017

000015ec.00002a64::2012/09/06-05:35:36.050 WARN  [RES] SQL Server Availability Group: [hadrag] Failed to retrieve data column. Return code -1

000015ec.00001a04::2012/09/06-05:35:36.050 ERR   [RES] SQL Server Availability Group: [hadrag] Failure detected, diagnostics heartbeat is lost

The million dollar question is still why?

The answer is that 'it depends.'   In this instance the SQL Server has encountered a system level problem and is stuck attempting to allocate memory and generating dump files.  Looking at the stacks for the dump 100s of threads are stalled, attempting to allocate memory, indicating a memory stall or problem on the overall system at some level and preventing SQL Server from processing work.

Bob Dorr - Principal SQL Server Escalation Engineer

 

 

 

 

 

 

 

 

 


Bulk Insert and Kerberos

0
0

I recently worked on two Bulk Insert cases that dealt with Kerberos. My favorite past time! In both cases, the customers were hitting the following error:

Msg 4861, Level 16, State 1, Line 1
Cannot bulk load because the file "<file name>" could not be opened. Operating system error code 5(Access is denied.).

This issue came down to Kerberos Delegation. In one case they were wanting to use Full Trust delegation, but there was some confusion on the CIFS principal. We don't need to add a CIFS Principal. You should just be able to enable Delegation for the SQL Service Account and it should work.

On the Constrained Delegation side of things, it turned out that we had to also enable Constrained Delegation on the Machine account of SQL Server as well as the SQL Service Account. This was due to how SMB2 works and will not always have the context of the user and instead be in the context of the System Account.

For more details about both, keep reading…

Techie Details

In the example I'm going to walk through, I'm just using a simple text file (ALongTimeAgo.txt) that contains the following:

Darth,Vader,TheDarth@galacticempire.com
Luke,Skywalker,FarmBoy@rebelsRus.com
Han,Solo,KesselRunner@rentasmuggler.com

Nothing fancy. The Bulk Insert is just going to try and load this data into a table I defined to hold a FirstName, LastName and Email.

clip_image001

Here is a look at what the environment looks like.

clip_image002

The File is sitting on a different server than what SQL Server is running on. Therein lies the problem. If we look at Process Monitor, which is a free SysInternals tool, we can see the Access Denied. This was run on the SQL Server, because that is where the CreateFile API call is made.

clip_image003

We can see that the request is trying to impersonate me instead of using the service account to access the file. We can look at dm_exec_connections to see that I'm connected to SQL using Kerberos.

clip_image004

The issue here is really about delegation. The SQL Server needs to be trusted to delegate my credential to another server/service. We see issues like this crop up because, typically, SQL Server is the back end server and the last stop on the journey of a connection/credential. So, in most cases, SQL Server will not be trusted for delegation. It is usually the Web Server or Application Server that is trusted for delegation because they want to get to SQL Server.

If I look at the delegation settings for my SQL Server's service account (BATTLESTAR\sqlservice), I see the following:

clip_image005

I have two options here. The first option is "Trust this user for delegation to any service (Kerberos Only)" which I refer to as "Full Trust". The other option, "Trust this user for delegation to specified services only", is Constrained Delegation and is more secure because you are explicitly allowing delegation for certain services and not a blanket pass.

Let's give the Full Trust option a try to see what happens. I'll need to restart the SQL Service after the change is made to clear any cache from an LSA perspective. LSA will cache failures.

clip_image006

I've had mixed results with restarting the SQL Service vs restarting the whole box (see this blog post), you may get away with just restarting the service, but you may need to reboot the box. After I restarted, I see the following:

clip_image007

That took care of one issue I was looking at. But, I was presented with another one that indicated they wanted to do Constrained Delegation. Initially their setup was not correct. When we go to use Constrained Delegation, we have to be specific about what service we specify.

Because we are hitting a file server, we are interested in the CIFS service. One thing I've seen people do is go to create the CIFS SPN because when they go to look at the Machine Account for the file server, they don't see it.

clip_image008

However, CIFS is covered by the HOST entries. Similar to HTTP. We do not need to add a CIFS SPN. However, with the Constrained Delegation, we do need to add the CIFS Service there. It should show up because the HOST entry is present on the Machine Account.

clip_image009

clip_image010

I found that I had to pick "Use any authentication protocol". I actually didn't expect that, but that is what I found through my testing.

So, with that set, I reboot the SQL Server again and give it a try.

clip_image011

We know Full Trust worked! So, why didn't Constrained Delegation work? I had enabled Kerberos Event logging earlier to catch items. So, when I look at the System Event Log on the SQL Server, I see the following:

Log Name:      System
Source:        Microsoft-Windows-Security-Kerberos
Date:          9/7/2012 1:49:41 PM
Event ID:      3
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      CaptThrace.battlestar.local
Description:
A Kerberos Error Message was received:
on logon session
Client Time:
Server Time: 18:49:41.0000 9/7/2012 Z
Error Code: 0xd KDC_ERR_BADOPTION
Extended Error: 0xc00000bb KLIN(0)
Client Realm:
Client Name:
Server Realm: BATTLESTAR.LOCAL
Server Name: captthrace$@BATTLESTAR.LOCAL <-- This should not be the Machine Account Context
Target Name: captthrace$@BATTLESTAR.LOCAL@BATTLESTAR.LOCAL
Error Text:
File: 9
Line: f09
Error Data is in record data.

NOTE:  Be aware that with Kerberos Event Logging, that failures may be caches and you may not see anything.  You may have to recycle the Service or reboot the box to actually see the failure.

It is showing the Machine name, not the Service Account or my User account. That is not what I would have expected. Because we are seeing the Machine Account in this respect, that would explain why it failed, because I haven't setup any Delegation settings for the Machine Account. Only the SQL Service Account. Let's see what happens when I set the Delegation settings on the Machine Account.

clip_image012

And reboot again.

clip_image013

It works! So, what happened?

The real issue here is due to the use of SMB2 and the redirector that I used. Due to the code path that we end up coming down for Constrained Delegation within LSASS, we do not have the context of the user. Instead, we have the context of the System Account. This is why we saw captthrace$ in the Kerb Event Log entry when it wasn't expected.

SMB2 is more asynchronous to maximize performance and causes you to run into this issue with Constrained Delegation. You could actually hit this with SMB1 as well, but it isn't likely as most requests will come from a thread that has the context of the user.

So, your options to get this working if you went down the path of Constrained Delegation are the following:

  1. Enable Constrained Delegation for the Machine account (this would be the Machine account that the SMB redirector worker threads run from - in our case the SQL Server machine account)
  2. Disable SMB2 - this is not recommended as you could introduce performance issues
  3. Use Full Trust instead of Constrained Delegation - This is also not recommended as it is a less secure option.
  4. Use SQL Authentication instead of Windows Authentication

 

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

Revisiting Inside Tempdb….

0
0

As I prepare for my next PASS Summit talk for the upcoming 2012 Summit, Inside SQLOS 2012 I was reviewing my talk from last year, Inside Tempdb, and some of the questions and feedback I received. They say it is never too late to provide all the facts so I realized I neglected to post some more details about some of the questions I received from my talk with some answers.

Worktables have negative objectids

When I made this statement at my talk, someone came up after the session and said they discovered that user-defined temporary tables have an object_id < 0 on SQL Server 2012. This person wanted to know if I had seen this and how this relates to my statement about negative objiectid values for worktables.

I finally sat down and researched this question and have the answer. In SQL Server 2008R2 and previous versions, we generated objectids for user-defined temporary tables just like we do for any user-defined table (the details I’ll not discuss here). But in SQL Server 2012, we made a conscious change to the algorithm so that objectids for user-defined temporary tables would be a particular range of values. Most of the time we use hex arithmetic to define these ranges and for this new algorithm these hex values spill into a specific set of negative numbers for object_id, which is a signed integer or LONG type. So in SQL Server 2012, you will now always see object_id values < 0 for user-defined temp tables when looking at a catalog view like sys.objects.

What about worktables then? My statement in the presentation is still true and worktable objectids remain < 0. But there is a special signature to how we generate this objectid so that the engine knows the pages that are allocated belong to a worktable (because the page type for these are DATA pages) and don’t conflict with the negative id value range for user-defined temp tables. The objectid of a worktable will always be a combination of the fixed value 0X80000000 and a counter value we increment each time we create a new worktable. So an example worktable objected would be 0x80000001. Convert this to decimal as a LONG integer and you get –2147483647. You might wonder how I found a worktable page since allocation of these are not logged and there is no record of the allocation of these in system tables in tempdb. Since these are data pages, they have hashed buffers so you can see these pages in sys.dm_os_buffer_descriptors. Quick way to see this. Run a DBCC CHECKDB and look for DATA pages in sys.dm_os_buffer_descriptors in tempdb. This is easier to find if you don’t have any user-defined temp table activity that could also have pages allocated. When you do this you might run into a page header that looks like this from DBCC PAGE. Note the negative objid value for this page yet m_type = 1 (which is a DATA page).

image

One thing I did not mention at the session is the object id for sort pages (m_type = 7). These will always appear with an objid = 0.

Active transactions in tempdb

On one of my slides I mentioned that the transaction log for tempdb may appear to grow out of control because of an active transaction but that transaction involving user-defined temporary tables was the only scenario I had seen to cause this. Someone from the audience (thank you to whoever you were I don’t remember your name) mentioned that sorts may also cause this. And that information is definitely correct.

If you execute a query that requires a sort operation which requires a “spill”, then the engine must write sort pages to disk and that requires a transaction (technically it requires several transactions but one outer transaction keeps it all active) which will remain active until the sort operation is complete. If you run into a scenario where an active transaction prevents log truncation and you see output like this from DBCC OPENTRAN

Transaction information for database 'tempdb'.

Oldest active transaction:
    SPID (server process ID): 51
    UID (user ID) : -1
    Name          : sort_init
    LSN           : (50:376:631)
    Start time    : Sep  8 2012 11:35:09:983PM
    SID           : 0x010500000000000515000000271a6c07352f372aad20fa5b36110000
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

then this indicates a long-running sort is preventing the log from truncating and it will grow until the sort is complete.

These were two topics that were follow-up from my Inside Tempdb talk. If you have seen this talk and have other topics that you have questions about post them as comments to this blog and I’ll respond to them. If they are larger enough topics I’ll edit this blog post with the details as I find the answers.

 

Bob Ward
Microsoft

SQL Server: Correlating Timestamps From Various Data Points

0
0

I was looking at data from a customer, in a different time zone (UTC+1) from mine, this week involving SQL Server AlwaysOn (HADRON) and found that lining up the timestamps in the various logs was challenging.   Some times are local to the SQL Server instance, others UTC and yet other utilities attempt to adjust the UTC time using your current client settings.  

I found it helpful to build a table for what each log captures to help me understand the flow of the issue.

For this table assume local time is UTC+1 and my client is UTC-6.

Log/Location Units Example
Cluster Log (Line Header) - Yellow Below UTC 05:34:56.019
Cluster Log (AlwaysOn Message Text with embedded time) - Green Below Local time from GetDate() of the SQL Server instance 06:34:56.019
XEL Timestamp Column UTC adjusted for your clients time zone.  In my case UTC-6 with Daylight Saving Time (DST) enabled 00:34:56.019
XEL Create Time AlwaysOn - Event Data Column Local time from GetDate() of the SQL Server instance 06:34:56.019
SQL Server Error Log Local time from GetDate() of the SQL Server instance 06:34:56.019
Windows Event Logs (Captured on system to TXT) Local time of the customer system 06:34:56.019
Performance Monitor Capture Windows 8 Client - UTC
Windows 2008 R2 Client - UTC
05:34:56.019
Captured Mini/Dump File Current client time zone setting 00:34:56.019
SQL Agent Log Local time of the customer system 06:34:56.019
SQL Profiler Event Display Local time from GetDate() of the SQL Server instance 06:34:56.019
sys.fn_xe_file_target_read_file Event data timestamp data is UTC 05:34:56.019
Reporting Services - Trace Log Local server time 06:34:56.019
Reporting Services - HTTP Log UTC 05:34:56.019
Reporting Services - Execution Log Local Server time 06:34:56.019
Analysis Services - Profile Trace Local Server Time 06:34:56.019
Analysis Services - Server Log Local Server Time 06:34:56.019
SharePoint ULS Log Local Server Time 06:34:56.019

Note: Based on SQL Server 2008 R2 and SQL Server 2012, Windows 2008 R2 and Windows 8

Example of the cluster log, AlwaysOn entry

000015ec.00002a64::2012/09/06-05:34:56.019 INFO  [RES] SQL Server Availability Group: [hadrag] SQL Server component 'query_processing' health state has been changed from 'warning' to 'clean' at 2012-09-06 06:34:56.017

Example setting with my client time zone set to the same time zone as the customers (UTC+1) and restarting SQL Server Management Studio (SSMS) to show matching timestamp adjustment. Without the client adjustment the XEL display shows me 00:34:56.019 for my UTC-6 client settings.

image

Example output when I open the dump file on my client using UTC-6 setting with DST enabled.

Debug session time: Thu Sep  6 00:34:56.019 2012 (UTC - 5:00)

Example Output of a SQL Error log entry and matching .TRC entry

2012-09-06 06:35:37.05 spid478     The client was unable to reuse a session with SPID 478, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

image

 

As you might imagine: I have filed several work items to consolidate the times across this matrix of data points to make it easier to troubleshoot issues.

Bob Dorr - Principal SQL Server Escalation Engineer
assisted by Adam Saxton for (RS, AS and SharePoint)

T-SQL Update Takes Much Longer Than The Matching Select Statement

0
0

I realize the title is generic and that is because the problem is generic.   On the surface it would not surprise me that an update takes longer than a select. (A little bit anyway.)

There is logging, updates to index rows, triggers, replication needs, AlwaysOn needs, perhaps page splits and even re-ordering when the key values are changed that can take place.  All of this leads to additional work and elongates the processing.

However, I am taking about a situation where the update takes significantly longer than the select and it seems excessive when you start looking at the actual number of I/O operations and such that are taking place.

-------------------------------------------------------------------------------

For example: Select query runs in 22 minutes but update takes 140 minutes.   When I first looked at this and similar issues, I found it hard to believe the additional update-related processing added that much time.  

The customer had uncovered that if they set the SQL Server's max server memory setting to 2GB the update ran in ~45 minutes instead of 145 minutes.  - Keep this in mind.

First Step 

I started looking at the plan differences between the update and the same select criteria.   I didn't see anything that popped out as significantly different for obtaining data on the select side of the input and the update side of the plan looked reasonable.

I then looked at the plan differences between the 2GB server memory setting and a larger memory setting - No differences, the same plan was being used. - Hmmm - interesting.

Second Step

I looked a the statistics time and I/O outputs to see if something significant could be uncovered.   The I/O was about the same but there was a significant difference in the CPU usage between the update and the select.

Third Step

Back to the plan for the update.  I was looking to see if it was possible the update portion of the plan could drive CPU if we had fetched the pages into memory.   Clearly I can come up with an update that touches a small number of pages, gets them locked into buffer pool memory and then updates the same rows many times; driving CPU and not physical I/O. - This was not the case for the scenario presented.  I had to update millions of rows to reproduce the problem.

Fourth Step

Started tracing the activity to see what other things were going on.   What I saw was lock escalation taking place when the query ran faster, under the 2GB SQL Server max server memory setting.

Locking

Now I had a pretty good idea that locking played a role in all of this.    I then enabled trace flag (-T1211 - use with caution) to disable lock escalation and I could cause the same issue on the SQL Server's max server memory setting to 2GB installation.

Fifth Step

Using debugging tools I captured the execution of the scenario and looked at those code paths using the most CPU resources.  What I found was a code path related to creation and destruction of a lock class (but unfortunately there are no XEvents or Trace events in this area.)

Note: I did file work items with the development team to expose this activity.

Lock Class

At a high level , a lock class is a container of locks for a given part of a plan.  It is often used to protect a plan from changing data as the data passes from one portion of the plan to the next.  (Think hash, sort, spool … type of operations).

Let's discuss the following example:

update t   set t.strData = CAST(t.iID as varchar(10))   from tblTest t   join tblTest t2 on t2.iID = t.iID

 

  |--Table Update(OBJECT:([tempdb].[dbo].[tblTest] AS [t]), SET:([tempdb].[dbo].[tblTest].[strData] as [t].[strData] = [Expr1006]))
       |--Table Spool
          |--Top(ROWCOUNT est 0)
                 |--Compute Scalar(DEFINE:([Expr1006]=CONVERT(varchar(10),[tempdb].[dbo].[tblTest].[iID] as [t].[iID],0)))
                      |--Nested Loops(Left Semi Join, WHERE:([tempdb].[dbo].[tblTest].[iID] as [t].[iID]=[tempdb].[dbo].[tblTest].[iID] as [t2].[iID]))
                           |--Table Scan(OBJECT:([tempdb].[dbo].[tblTest] AS [t]))
                           |--Table Scan(OBJECT:([tempdb].[dbo].[tblTest] AS [t2]))

I have highlighted the table spool because this is where a lock class can appear (which you can't see).   What the SQL Server does is look at the locking strategy established by the session and the plan and if necessary upgrade the strategy to 'repeatable read' during portions of the plan.

In this case the rows for the update are being fed via a table spool.   This means SQL Server does not want to release the lock on the rows flowing through the spool until it has completed the proper update(s).  If the lock was not held through the table spool to the update level, the data could change as it is held in the spool. 

The problem is not the lock class.  Even if the isolation level is upgraded to 'repeatable read', during the window, resulted in reduced concurrency; that in and of itself won't lead to increased CPU usage.

The CPU is coming from the release activity associated with the lock class.   Once the lock class is no longer needed the SQL Server releases it and in turn the references to the appropriate locks are released.

Clearly I can see that lock escalation will reduce the number of locks and reduce this work.   One workaround but probably not what most folks want to do in a production environment.

My first thought was how many locks is the update requiring that would cause us to do a lot of work during release?   In studying the lock acquired and released events for the update statement I found that it was only a handful.  So again, why the large CPU burn? - The problem was the update was inside a cursor loop that executed millions of times and all of this was under a single transaction.

Another workaround I found was to use smaller transactions, but more of them.  I found this odd as I am still doing the same amount of work just in smaller chunks.   Smaller chunks would help avoid lock escalations and I thought it would make it worse based on previous testing.

The Problem

What is happening is the lock class has a bug.  It is not properly releasing just the locks it acquired.  It is running the entire lock list for the transaction.   Because the locks acquired before the update don't have a lock class association there is nothing to do for those locks.   The locks are properly maintained, the SQL Server is just running the entire lock list instead of the portion associated with the lock class.

In this reproduction case there are 400,000 locks acquired before the cursor execution driving the updates.  This means that each update will run ALL 400,000 lock structures and find nothing to do.  As the next update occurs and SQL Server does it all over again, burning the unnecessary CPU.   Based on this behavior; when I dropped the size of the transaction I reduced the number of locks and as such the number of CPU cycles.

Note: Bug filed with the SQL Server development team.  Internal testing, with the bug fix, shows the query taking 140 minutes consistently runs in 32 minutes without any T-SQL code changes.   This is scheduled to be released in an upcoming release (~ Nov 2012).

----------------------------------------------------------------------------------------
Is This My Problem? Am I a Candidate for the Fix?

There are some things you can do to see if your statements are encountering this problem.

The easiest way to test is to 'set transaction isolation level repeatable read' at the session level and then run the set of queries in question.    Repeatable read may use more lock memory but it also acts as a broad 'lock class' for this test.   I have also used snapshot isolation with near repeatable read results.   

If the query has some of the before mentioned operations, runs significantly faster and uses far less CPU there is a good chance it is a candidate for the fix.

Without the fix you can use the transaction isolation levels, smaller transactions, or even locking hints to help control the behavior.

 

Bob Dorr - Principal SQL Server Escalation Engineer
assistance provide by Keith Elmore -  - Principal SQL Server Escalation Engineer

Worker thread governance coming to Azure SQL Database

0
0

Starting with the service update that went out recently, soft throttling on worker threads is changing. Over the next few months, soft throttling will eventually be replaced by worker thread governance. In the meantime, users may see requests failing due throttling on worker threads (error 40501) or worker thread governance (errors 10928 and 10929). The retry logic in your application should be modified to handle these errors. Please see http://go.microsoft.com/fwlink/?LinkId=267637 for more information on this topic.

While we roll-out the new worker thread governance mechanism on all datacenters, users may see requests failing due to either one of two reasons – throttling on worker threads (40501) or worker thread governance (new error codes : 10928, 10929; see table below). During this time, it is recommended that the retry logic in your application is suitably modified to handle both throttling error code (40501) and governance error codes (10928, 10929) for worker threads.
Please go through information below and modify your applications as required. Eventually, once worker thread governance is fully rolled out in all datacenters and soft throttling for worker threads has been disabled, we will notify users.


Please note that 40501 errors seen due to hard throttling on worker threads and due to throttling on other resources will continue to be seen as before. Please ensure your error catching logic continues to handle these 40501s as before.

  Current mechanism :Worker thread throttling New mechanism :Worker thread governance (coming soon)
Description

When soft throttling limit for worker threads on a machine is exceeded, the database with the highest requests per second is throttled. Existing connections to that database are terminated if new requests are made on those connections, and new connections to the database are denied, until number of workers drops below soft throttling limit.
The soft throttling limit per back-end machine currently is 305 worker threads.

Every database will have a maximum worker thread concurrency limit. *Please note this limit is only a maximum cap and there is no guarantee that a database will get threads up to this limit, if the system is too busy.*
Requests can be denied for existing connections in following cases:
1. If the maximum worker thread concurrency limit for the database is reached, user will receive error code 10928.
2. If the system is too busy, it is possible that even fewer workers are available for the database and user will receive error code 10929. This is expected to be a rare occurrence.

Error returned 40501 :The service is currently busy. Retry the request after 10 seconds.Incident ID: <ID>. Code: <code>.

10928 : Resource ID: %d. The %s limit for the database is %d and has been reached. See http://go.microsoft.com/fwlink/?LinkId=267637 for assistance.
10929 : Resource ID: %d. The %s minimum guarantee is %d, maximum limit is %d and the current usage for the database is %d. However, the server is currently too busy to support requests greater than %d for this database. See http://go.microsoft.com/fwlink/?LinkId=267637 for assistance. Otherwise, please try again later.
Resource ID in both error messages indicates the resource for which limit has been reached. For worker threads, Resource ID = 1

Recommendation

Back-off and retry request after 10 seconds;
See best practices

10928 : Check dm_exec_requests to view which user requests are currently executing
10929 : Back-off and retry request after 10 seconds;
See best practices

Note : The hard throttling on worker thread mechanism is not being changed and will continue to return a 40501 error to user applications.

Microsoft CSS @ PASS Summit 2012

0
0

This is a little later than we normally do it, but better late than never!  During November 6th-9th, the US PASS Summit 2012 will be held in Seattle, Washington at the Seattle Convention Center.  The Microsoft CSS team has a long history with PASS.  We have been speaking and working at PASS since 2003 with this year shaping up to be the best yet!  Here is a quick look at what we will be doing.

 

Pre-Conference Seminar

We will have one Pre-Con this year at PASS.  Bob Ward and Adam Saxton are teaming up to bring you Customer Stories from the Front Line!  Bob is a Principal Architect Escalation Engineer who has been supporting SQL Server since 1993!  There is a lot of history there that helps bring these customer stories some perspective.  Adam is a Sr. Escalation Engineer that focuses primarily on our Business Intelligence products, but don’t let that fool you.  He knows a thing or two about the SQL Engine as well.

This day long seminar will feature 10 customer stories.  We will look at what the problem was and what we do to fix these types of issues.  We really want to highlight the things that you can do as users of the products to avoid having to call us.  We will also take some opportunities to dig into the product a little bit and explain how some things work under the hoods.  Some of the areas we will cover:

  • Memory issues
  • Setup
  • Hangs
  • Connectivity
  • and more…

This Pre-Con is primarily SQL Engine focused.  We do touch lightly on the BI area, but that is not really the focus.  We will have lots of great demos and plenty of time for questions.  This will really be a great look at what types of issues we see in Support, along with our insights of how the product works. You can also read a Q&A with Adam Saxton that PASS posted on their website.

 

Main Conference Talks

We will have four main conference talks this year by Engineers in CSS.  These are listed by the dates/times that they will occur.  This doesn’t mean that these times won’t change due to scheduling needs at PASS, or that the rooms will be the same.  Be sure to check your schedules when you arrive at the conference for the latest information.

(DBA-500-HD) Inside SQLOS 2012 – Bob Ward

Wednesday, November 7th – 1:30pm-4:30pm – Room 618-620

Bob continues his Inside series with a look at the underpinnings of the SQL Engine and what is new with SQL 2012.  This is a “500” level talk which means it will be very deep and will focus on internals.  That being said, this will be a great talk to find out how the Engine is powered with looking at the core infrastructure for SQL.  The value in understanding the SQLOS Internals as it will help you plan, design, manage, monitor and troubleshoot your database server.  The more you know about how SQL will behave, the better off you will be.  As always, there will be plenty of demos and even the appearance or two from our Windows Debugger friend!  As this will be a half day session, there will be a lot of time to cover this deep and broad topic.

(AD-310-C) SQL Server and SharePoint: Best Frienemies – Lisa Gardner

Wednesday, November 7th – 4:45pm-6pm – Room 6E

Lisa is a Premier Field Engineer (PFE).  This means that she gets to work closely with some of our customers.  In this talk, she will cover a topic that comes up quite a bit.  Working with SharePoint Databases being hosted within SQL Server.  SharePoint deployments have been increasing and that means SQL Server is seeing a lot more SharePoint Databases.  She will cover what you need to know when working with SharePoint Databases.  If you have to deal with SharePoint databases in your environment, this is a must see talk!

(DBA-407-C) Troubleshooting SQL Server 2012 Performance with Extended Events – Rohit Nayak

Friday, November 9th – 8am-9:15am – Room 301-TCC

Rohit is one of our engineers for the SQL Engine focusing on performance issues.  In this talk, he will show some of the improvements to Extended Events that really make this feature an awesome tool in your belt.  If you have ever struggled with Extended Events in the past and gave up, or you have never heard about it, this is definitely a talk for you!  Extended Events has come a long way and is really a great feature in SQL 2012.  Come learn how you can take advantage of this feature in a great talk with some great demos!

(BIA-401-C) Working with Claims and SQL Server BI Technologies – Adam Saxton

Friday, November 9th – 8am-9:15am – Room 6E

If you aren’t familiar with what Claims is, or have struggled with it, this is a must see!  This talk will have a look at Reporting Services and PowerPivot with SQL 2012 being hosted in SharePoint 2010.  Claims will be in the picture and you need to know how it will affect you.  If you thought Kerberos was dead, think again!  This talk will get into the Kerberos aspects of this configuration as well by someone that is very knowledgeable about the Kerberos topic.  If you have followed this blog, you have seen some of Adam’s posts regarding this.  This will be a great talk to get insight into this complex area and also get a chance to ask questions that you may have.

 

image

This is one of the highlights for the SQL PASS Conference.  Imagine a room where, on one side, you have the SQLCAT team to ask “design” or “advisory” type questions to.  And, on the other side of the room, you have the SQL Support team that you can ask about an error you are getting, or how to fix something.  That is the SQL Server Clinic!  And this year, it will be bigger and better than ever! 

We have a new room this year!  We will be in 4C-3 which will be across from the Expo hall and on your way to the meal hall.  It is about 5 times bigger than the room we have had in the past, so there will be plenty of room.  There are also rumors of some much needed refreshment in the afternoon hours.

We will have a full contingent of members of the SQLCAT team.  This will be matched by the CSS Conference Speakers as well as Support Engineers and Premier Field Engineers from across the globe!  This is the ultimate unique opportunity to interact with the CSS and SQLCAT teams line no other.  We cannot guarantee this is like “getting a free case” from CSS, but we can help point you in the right direction.  The questions we get in the Clinic range from “how does this work” to “I have a crash, can you look at it?”.  In some situations in the past, we have been able to use our laptops, or the customer’s, and either demonstrate how to solve the problem, or actually fix it on the spot.  They don’t all work out that way, but it is probably fair to say that when you walk out of the room, you will have left with more than when you walked in the door.  Even if that just means you wanted to come in to meet a new face or network with some of the people from Microsoft that put out customer fires or test the limits of SQL Server.

This year at the clinic, we will be offering what we are calling the Early Bird hours before the Key notes.  This will give you an opportunity to stop by on your way to breakfast or that first cup of coffee to get your morning started.  The Clinic will be open Wednesday thru Friday.  The main hours of the Clinic will begin right after the keynotes (we want to see them too!) and will go until the end of the day with the exception of Friday.

The clinic hours this year are:

  • Wednesday
    • 7:00am-8:00am (Early Birds)
    • 10:00am – 6:15pm
  • Thursday
    • 7:00am-8:00am (Early Birds)
    • 10:00am – 6:30pm
  • Friday
    • 7:00am – 2:00pm

Be sure to stop by the Clinic at least once during the conference!  We want to hear from you and your experience with SQL Server, even if you don’t have a question.  We will be available to talk about any topic related to SQL Server, or to strike up a conversation about your favorite sports teams (although the Rangers and Cowboys may be sore topics this year).

We hope to see you there!

How Can Reference Counting Be A Leading Memory Scribbler Cause?

0
0

The concept of the memory scribbler comes up quite a bit in support.   The term can often be over used but I ran into a specific example that commonly fools people, including support engineers.  The random nature and even the resulting behaviors are so broad that these issues often take quite a bit of troubleshooting to determine cause.

Hint:  Repro is the fastest way to get to the solution.  


What Is A Scribbler?

A scribbler is defined as an action taken by code that results in memory that is does not own being changed.

The definition is pretty techy so I like to describe this as coloring outside the lines.   The code is to run exactly as intended but a scribbler bug causes the logic to step outside the expected behavior.

The picture I included clearly shows a drawing of a crab, but there are some locations where the color comes outside the shell - scribbled.

 


Let's look at a simple example.  You would not intentionally write code this way but it only takes 3 commands to show you what an un-owned memory scribble looks like.

BYTE * pData = new BYTE[10];    // Allocates memory

delete [] pData;  // Releases memory 

// code path no longer OWNS the memory as it has been returned to them memory manager
memcpy(pData, 'X', 10);    // Scribbles on the memory is does not own

Visualized this would look like a heap with free memory regions.

image

An allocation removes an entry from the free list and assigns it to the caller (owner of the memory.)

image

The delete returns the memory to the heap free list but the local variable still points to the memory address.   The value assigned to pData should no longer be used because the memory has been returned to the memory manager and will be assigned or released to the operating system.  At this juncture pData should assume (NO ACCESS.)

image

A second allocation request occurs and the memory is handed out to pDataANOTHER.  The pData was not set to NULL so we have two variables that point to the same memory address.   The pData should be considered (stale/old) and not reused.

 

image

Now the code mistakenly has logic that results in the use of pData after it was released to the memory manager (NOT OWNED by pData).  The example shows that memset scribbling (sometimes called stomping) on the memory that pDataANOTHER is the owner of.

image

Because this type of bug can scribble any number of bits at any random location the behavior can be anything from exceptions, invalid results, to unexpected behavior just to mention a few.   This is why it is imperative to break such a problem down to a reproducible scenario allowing the exact steps that cause the bug to be studied and corrected.


What Is A Bit Flip?

When working with scribblers the term 'bit flip' often comes up.  A bit flip is when a single bit appears to have been changed.  For example 2 and 3 are the following binary representation.

  • 2 = 0010
  • 3 = 0011

The difference between them is a single bit change, or as if a single bit is flipped from 0 to 1 or 1 to 0. Hence the term bit flipped.

Most people think of this as a stale pointer and bitwise operation.  Going back to the pData and pDataANOTHER assume the pDataANOTHER = 0x2 and the following code was executed.

pData[0] |= 1;

This changes the value in pDataANOTHER to 0x3  (from 0010 to 0011) and appear as a single bit flip.


Stale ~= Reference Counting

In practice I don't find the single bit flip as common (don't get me wrong it can and does happen) I instead find the behavior looks like a bit flip but the issue arises from a reference counting operation.

Let's advance the example a bit and build the following two classes in our code.  

MyOldClass (pOld) has a reference counter located at the start of the class and MyNewClass (pNew) has a status value in that same offset location within each class.  The AddRef and Release performs InterlockedIncrement (++) and InterlockedDecrement (--) operations on the reference count member.

 

imageimage

Assume the same type of activity has taken place and pOld memory was released back to the memory manager but a bug allows pOld->Release() to be called after the memory was released.  The pNew allocation has already reused the memory address so where the m_dwStatus physically resides is where pOld thinks the m_dwRefCount is also located.

The logic in ::Release() is to decrement the value located in in m_dwRefCount.  I.E.  InterlockedDecrement(&m_dwRefCount) or m_dwRefCount--;  Since we are only subtracting (-1) it can look like a bit flip or a damaged byte in memory.

  • 0x03  (0011) and a -1 results in 0x2 (0010) --  Looking like a bit flip
  • 0x0 (0000) and a -1 can result in 0xF (1111)  -- Multiple bits are changed

What's The Big Deal?

It is obvious to you that this is not the desired behavior and it needs to be corrected to provide proper stability to the application.   However, there are other side effects that you may not have considered.

What if this is SQL Server memory management and the stale pointer ends up pointing back to a data page because the memory page was reused to support a data page when the pointer was released to the memory manager?   Now the scribble behavior can happen on the actual data stored on the page.  If this impacts the row tracking structures you can see corruption issues reported by DBCC or at runtime but it the scribble impacts the actual data storage bytes you may not notice this until you customer complains their name is not spelled correctly or that $500 they deposited in the account has become $499 and their statement won't balance.

Security: More concerning is when the problem can be used for a security exploit.   There are various ways that the behavior might be susceptible to an exploit.  If you really are interested look up 'Heap Exploit and Heap Spraying' on Bing.   This is why every exception reported to Microsoft is checked by our security teams for exploitable possibilities.

One way is to take the class example and extend it to include a virtual method so the class contains a VTABLE.  Now the pOld overwrite can change the VTABLE pointer.  If the overwrite action can be modified by the user they could potentially point the VTABLE functions to some code they want to execute and not the proper code to be executed you have a security exploit.


How Are You Protected?

There is not such a thing as 100% protection but Microsoft products go to great length to make sure this does not happen.  In fact, you can read about the extended heap protection the operating system provides out of the box to help prevent exploits from any application.  (http://blogs.msdn.com/b/b8/archive/2011/09/15/protecting-you-from-malware.aspx)

Our policy is that any heap or memory manager must attempt to protect itself against such an attach.   Thus, anytime the internal structures of the memory manager are compromised it is a requirement that the process be terminated.

For SQL Server you may see the (ex_terminator) handler.   SQL Server installs the termination, structured exception handler, around all memory manager activities (I.E, Alloc, Free, ….).   If any exception or assertion by the code fails the termination handler is used to capture a dump (Using SQLDumper as the external process so we are not using the compromised process of SQLServer.exe) and SQL Server is terminated.

You can also reference the following SQL Server protection mechanisms that can help locate a possible scribbler source.

Note: Trace flags should be used with caution and under the guidance of Microsoft.

 

Bob Dorr - Principal SQL Server Escalation Engineer


Using a Windows Azure worker role to generate reports using Azure Reporting

0
0

While trying to reproduce an issue in Azure Reporting, I found myself building a simple worker role that generated a report using the ReportViewer control in server mode. I found a couple gaps in the overall content available, so I thought I would try to post a more complete example here. 

NOTE: I am assuming that you know how to create an Azure deployment and an Azure Reporting instance, plus design and publish a basic report.

The first thing I had to do was create a basic report. The report and the datasource look like this:

image

I then published this report and the associated data source to my Azure Reporting Instance using the built-in BIDS functionality.

image

------ Deploy started: Project: Report Project1, Configuration: DebugLocal ------
Deploying to
https://igwbloe2yk.reporting.windows.net/reportserver
Deploying data source '/Data Sources/DataSource1'.
Deploying report '/SubReportRepro/BaseReport'.
Deploying report '/SubReportRepro/MasterReport2'.
Deploy complete -- 0 errors, 0 warnings

Next, I created a Windows Azure Worker Role project. Because Azure Reporting is protected by Forms Authentication, I had a to add a custom class to manage the user credentials. Although I modified the code a bit so I didn’t have to hardcode the credentials, it is pretty much identical to the MSDN documentation on this class. However, because the MSDN code sample is missing the Using statements, here is the complete code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using Microsoft.Reporting.WebForms;
using System.Net;
using System.Security.Principal;
 
 
namespace WebRole1
{
/// <summary>
/// Implementation of IReportServerCredentials to supply forms credentials to SQL Reporting using GetFormsCredentials() 
/// </summary>
publicclass ReportServerCredentials : IReportServerCredentials
    {
privatestring _user;
privatestring _password;
privatestring _authority;
 
public ReportServerCredentials(string user, string password, string authority)
        {
            _user = user;
            _password = password;
            _authority = authority;
 
        }
 
 
public WindowsIdentity ImpersonationUser
        {
            get
            {
returnnull;
            }
        }
 
public ICredentials NetworkCredentials
        {
            get
            {
returnnull;
            }
        }
 
 
   
        {
            authCookie = null;
            user = _user;
publicbool GetFormsCredentials(out Cookie authCookie, outstring user, outstring password, outstring authority)
            password = _password;
            authority = _authority;
returntrue;
        }
    }
 
}

Next, I had to write the worker role code. Again, this code is stock worker role code with the exception of the code inside the Run method. The ReportViewer manipulation code is stock ReportViewer code from MSDN as is the blob storage code.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Net;
using System.Threading;
using Microsoft.WindowsAzure.Diagnostics;
using Microsoft.WindowsAzure.ServiceRuntime;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Blob;
using Microsoft.Reporting.WebForms;
using System.IO;
 
 
namespace WorkerRole1
{
publicclass WorkerRole : RoleEntryPoint
    {
publicoverridevoid Run()
        {
// This is a sample worker implementation. Replace with your logic.
            Trace.WriteLine("$projectname$ entry point called", "Information");
 
while (true)
            {
 
try
                {
                    Trace.WriteLine("Rendering a report", "Information");
 
//Instantiate an instance of the ReportViewer control
//Since all I am doing is rendering, this is much easier than doing SOAP API calls
                    Microsoft.Reporting.WebForms.ReportViewer rv = new Microsoft.Reporting.WebForms.ReportViewer();
                    rv.ProcessingMode = ProcessingMode.Remote;
                    rv.ServerReport.ReportServerUrl = new Uri(RoleEnvironment.GetConfigurationSettingValue("RSUrl"));
                    rv.ServerReport.ReportPath = RoleEnvironment.GetConfigurationSettingValue("ReportPath");
                    rv.ServerReport.ReportServerCredentials = new WebRole1.ReportServerCredentials(RoleEnvironment.GetConfigurationSettingValue("User"), RoleEnvironment.GetConfigurationSettingValue("Password"), RoleEnvironment.GetConfigurationSettingValue("RSUrl").Replace("http://", ""));
 
                    Warning[] warnings;
string[] streamids;
string mimeType;
string encoding;
string extension;
byte[] bytes = rv.ServerReport.Render(
"PDF", null, out mimeType, out encoding, out extension,
out streamids, out warnings);
 
                    Trace.WriteLine("Writing report to storage");
//first, set up the connection to blob storage
                    CloudStorageAccount storageAccount = CloudStorageAccount.Parse(Microsoft.WindowsAzure.CloudConfigurationManager.GetSetting("TargetReportStorage"));
                    CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
 
// Retrieve a reference to a container that already exists
                    CloudBlobContainer container = blobClient.GetContainerReference(RoleEnvironment.GetConfigurationSettingValue("TargetContainer"));
 
                    container.CreateIfNotExists();
 
// Retrieve reference to a blob named "myblob".
                    CloudBlockBlob blockBlob = container.GetBlockBlobReference(Guid.NewGuid().ToString() + ".pdf");
 
 
                    MemoryStream fs =
new MemoryStream();
                    fs.Write(bytes, 0, bytes.Length);
                    fs.Seek(0, SeekOrigin.Begin);
 
                    blockBlob.UploadFromStream(fs);
                }
catch (Exception ex)
                {
 
                    Trace.WriteLine(ex.Message + ex.StackTrace);
                }
 
                Thread.Sleep(Convert.ToInt32(RoleEnvironment.GetConfigurationSettingValue("BetweenReportsMS")));
 
            }
        }
 
publicoverridebool OnStart()
        {
// Set the maximum number of concurrent connections 
            ServicePointManager.DefaultConnectionLimit = 12;
 
try
            {
// For information on handling configuration changes
// see the MSDN topic at http://go.microsoft.com/fwlink/?LinkId=166357.
 
                DiagnosticMonitorConfiguration config = DiagnosticMonitor.GetDefaultInitialConfiguration();
 
// Schedule a transfer period of 30 minutes.
                config.Logs.ScheduledTransferPeriod = TimeSpan.FromMinutes(1.0);
 
// Display information about the default configuration.
//ShowConfig(config);
 
// Apply the updated configuration to the diagnostic monitor.
// The first parameter is for the connection string configuration setting.
                DiagnosticMonitor.Start("Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString", config);
 
            }
catch (Exception e)
            {
                Trace.WriteLine("Exception during WebRole1.OnStart: " + e.ToString());
// Take other action as needed.
            }
 
 
returnbase.OnStart();
        }
 
privatevoid ShowConfig(DiagnosticMonitorConfiguration config)
        {
 
try
            {
 
if (null == config)
                {
                    Trace.WriteLine("Null configuration passed to ShowConfig");
return;
                }
 
// Display the general settings of the configuration
                Trace.WriteLine("*** General configuration settings ***");
                Trace.WriteLine("Config change poll interval: " + config.ConfigurationChangePollInterval.ToString());
                Trace.WriteLine("Overall quota in MB: " + config.OverallQuotaInMB);
 
// Display the diagnostic infrastructure logs
                Trace.WriteLine("*** Diagnostic infrastructure settings ***");
                Trace.WriteLine("DiagnosticInfrastructureLogs buffer quota in MB: " + config.DiagnosticInfrastructureLogs.BufferQuotaInMB);
                Trace.WriteLine("DiagnosticInfrastructureLogs scheduled transfer log filter: " + config.DiagnosticInfrastructureLogs.ScheduledTransferLogLevelFilter);
                Trace.WriteLine("DiagnosticInfrastructureLogs transfer period: " + config.DiagnosticInfrastructureLogs.ScheduledTransferPeriod.ToString());
 
// List the Logs info
                Trace.WriteLine("*** Logs configuration settings ***");
                Trace.WriteLine("Logs buffer quota in MB: " + config.Logs.BufferQuotaInMB);
                Trace.WriteLine("Logs scheduled transfer log level filter: " + config.Logs.ScheduledTransferLogLevelFilter);
                Trace.WriteLine("Logs transfer period: " + config.Logs.ScheduledTransferPeriod.ToString());
 
// List the Directories info
                Trace.WriteLine("*** Directories configuration settings ***");
                Trace.WriteLine("Directories buffer quota in MB: " + config.Directories.BufferQuotaInMB);
                Trace.WriteLine("Directories scheduled transfer period: " + config.Directories.ScheduledTransferPeriod.ToString());
int count = config.Directories.DataSources.Count, index;
if (0 == count)
                {
                    Trace.WriteLine("No data sources for Directories");
                }
else
                {
for (index = 0; index < count; index++)
                    {
                        Trace.WriteLine("Directories configuration data source:");
                        Trace.WriteLine("\tContainer: " + config.Directories.DataSources[index].Container);
                        Trace.WriteLine("\tDirectory quota in MB: " + config.Directories.DataSources[index].DirectoryQuotaInMB);
                        Trace.WriteLine("\tPath: " + config.Directories.DataSources[index].Path);
                        Trace.WriteLine("");
                    }
                }
 
// List the event log info
                Trace.WriteLine("*** Event log configuration settings ***");
                Trace.WriteLine("Event log buffer quota in MB: " + config.WindowsEventLog.BufferQuotaInMB);
                count = config.WindowsEventLog.DataSources.Count;
if (0 == count)
                {
                    Trace.WriteLine("No data sources for event log");
                }
else
                {
for (index = 0; index < count; index++)
                    {
                        Trace.WriteLine("Event log configuration data source:" + config.WindowsEventLog.DataSources[index]);
                    }
                }
                Trace.WriteLine("Event log scheduled transfer log level filter: " + config.WindowsEventLog.ScheduledTransferLogLevelFilter);
                Trace.WriteLine("Event log scheduled transfer period: " + config.WindowsEventLog.ScheduledTransferPeriod.ToString());
 
// List the performance counter info
                Trace.WriteLine("*** Performance counter configuration settings ***");
                Trace.WriteLine("Performance counter buffer quota in MB: " + config.PerformanceCounters.BufferQuotaInMB);
                Trace.WriteLine("Performance counter scheduled transfer period: " + config.PerformanceCounters.ScheduledTransferPeriod.ToString());
                count = config.PerformanceCounters.DataSources.Count;
if (0 == count)
                {
                    Trace.WriteLine("No data sources for PerformanceCounters");
                }
else
                {
for (index = 0; index < count; index++)
                    {
                        Trace.WriteLine("PerformanceCounters configuration data source:");
                        Trace.WriteLine("\tCounterSpecifier: " + config.PerformanceCounters.DataSources[index].CounterSpecifier);
                        Trace.WriteLine("\tSampleRate: " + config.PerformanceCounters.DataSources[index].SampleRate.ToString());
                        Trace.WriteLine("");
                    }
                }
            }
catch (Exception e)
            {
                Trace.WriteLine("Exception during ShowConfig: " + e.ToString());
// Take other action as needed.
            }
        }
 
    }
}

Those of you who are paying close attention might have noticed that I use RoleEnvironment.ConfigurationsSetting(“XXXXX”) for all of my passwords, connection strings, etc. This is handy because it allows me to configure those values at run time instead of design time using standard Windows Azure methods. You can edit these either via the Windows Azure portal in production or in Visual Studio during development. Here’s what the Visual Studio dialog looks like:

image

Now, here is the tricky part. Because I elected to use the ReportViewer control, I need to ensure that the ReportViewer assemblies are accessible to my Windows Azure role. They aren’t part of the standard Azure deployment so that leaves me with two choices:

  1. Add a startup task to install the ReportViewer control
  2. Upload copies to the assemblies as part of my deployment

Option 1 isn’t very difficult, but I wanted to minimize the size of my deployment package, so I elected to go with option 2. The easy part was to make sure the the Copy Local setting of the Microsoft.ReportViewer.Common and Microsoft.ReportViewer.WebForms assemblies was set to True. Doing the same for Microsoft.ReportViewer.DataVisualization and Microsoft.ReportViewer.ProcessingObjectModel was a bit trickier because they live in the GAC. First, I had to manually copy them out of the GAC and into my project folder and then I had to add explicit references the local copies of these assemblies. Lastly, just like the other ReportViewer assemblies, I had the ensure that the Copy Local property was set to True.

Now, after deploying my worker role to Azure using standard techniques I could watch my blob storage and see reports being generated from my worker role.

At this point, I want to take a minute and plug Windows Azure’s scalability. By increasing the number of instances behind my worker role to 50 (just a simple configuration change), I was able to generate more than 60K reports over the course of the next 8 hours. Then, once my testing was done, I deleted the deployment. Try configuring 50 on-premises machines and then finding a new home for them after just 8 hours. You will probably find lots of people who will take them, but good luck getting paid anywhere near the purchase price!

Ack! Where the heck did Azure Reporting and Data Sync go??!?!

0
0

As you might have noticed, we have been moving more and more of our portal functionality to the new HTML5 portal, plus all of our new features are showing up there and not in the original Silverlight portal. Unfortunately, we are stuck for some period of time where some of the features and functionality are not replicated to the HTML5 portal.

Unfortunately, Azure Reporting and Data Sync are currently in that scenario. I expect them both to move to the HTML5 portal in the near future, but in the meantime I have heard from a number of folks that they cannot find either one, so I wanted to share some screenshots to help you find them.

First, here is the new portal that you will see when you first log in:

image

At this point, you can see that there is no obvious way to get back to the Silverlight portal. I even have to admit that it took me about ten minutes of poking around the new interface to find it. The trick is to click on your login in the upper right-hand corner. This will then drop down a menu on the right-hand side of the screen that looks like this:

image

The key here is to notice the “Previous Portal” link I have highlighted. Click on that and then you will end up back and the Silverlight portal where you can still see Azure Reporting.

image

 

Since we haven’t disabled any of the old functionality, you can still do all the things you used to be able to do there (edit your databases, create Azure deployments, etc.), but I would highly recommend you do the bare minimum in this portal. All of our bug fixes and improvements are targeted at the HTML5 portal and you will have a better overall experience in the new portal.

Analysis Services - Errors when trying to add a User as a Server Admin

0
0

When trying to add a user as a Server Admin to Analysis Services, you may encounter one of the following errors:

The following system error occurred:  The trust relationship between the primary domain and the trusted domain failed.
(Microsoft.AnalysisServices)

Or

The following system error occurred:  No mapping between account names and security IDs was done.
(Microsoft.AnalysisServices)

When you see this, you want to look for an entry in the Admin list that looks like a SID (Security Identifier) instead of the actual account name.  For example:

image

This emails that the account that was listed as a Server Admin was probably deleted, or if you were in a Trusted Domain scenario, the Trust may have been broken preventing the ability to resolve the SID to the account name.

If you see this, and then add a new user, when you hit on the OK button on the bottom, you will see one of the two errors above.  The first error could come up if you are in that Trust Domain Scenario and the user was from a different domain.  The second error may come up if it was just a user in the same Domain as the SSAS Service.

To get around this issue, remove the SID account and then you should be able to add the new Admin account normally.

 

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

How It Works: Gotcha: *VARCHAR(MAX) caused my queries to be slower

0
0

The scenario:

  • Table has a NTEXT column that the customer wanted converted to NVARCHAR(MAX)
  • Data has both small and large storage for different rows
  • Issued ALTER TABLE … ADD COLUMN …NVarCharColumn… NVARCHAR(MAX)
  • Issued update MyTable set NVarCharColumn = <<NTEXT DATA>>
  • Issued ALTER TABLE … DROP COLUMN .. <<NTEXT>>

Sounds harmless enough on the surface and in many cases it is.  However, the issue I worked on last week was a table with 98 partitions and 1.2 billion rows (~7TB of data) that has a couple of gotcha’s you may want to avoid.

Behavior #1 – Longer Query Times For Any Data Access – Huh?  Well maybe!

Start with a simple table containing 3 columns (A = int, B = guid and C = NTEXT).  As designed the data page has a TEXT POINTER to the NTEXT data.  The TEXT POINTER takes up ~16 bytes in the row to point to the proper text chain.

image

When you run a select A, B from tblTest a single page is read because column C is an off-page action and we are not selecting the data for column C.

Now lets convert this example to NVARCHAR(MAX) and allow the table to inline the NVARCHAR(MAX) column.

image

From the diagram you can see that the TEXT pages are no longer present for rows 1 and 2, as the data was moved inline.   This happens when the data is small enough to be stored inline with the row data instead of off page (as shown with Row #3 addition to the example.)

Now if you run the same select A, B from tblTest you encounter 2 data page, I/Os for row 1 and 2 because the data values are not as compact when the NVARCHAR(MAX) data is stored inline.   This additional overhead can lead to slower query performance, reduced page life expectancy and increased I/O even when the column(s) you are selecting do not include the NVARCHAR(MAX) data.

The following blog is a great reference about this very subject: “http://stackoverflow.com/questions/1701808/should-i-use-an-inline-varcharmax-column-or-store-it-in-a-separate-table” The blog specifically outlines the sp_tableobject ‘large value types out of row’ usage and the use of column %%physloc%% to determine your *VARCHAR(MAX) storage properties.

Behavior #2 – Text Data Not Cleaned Up?

Specifically I want to focus on the original NTEXT column and the associated DROP COLUMN.   In the scenario the NTEXT column was dropped using ALTER TABLE … DROP COLUMN.    At first you may think this cleans up the TEXT data but that is not the case.   The cleanup happens later, when row is modified.

From SQL Server Books Online:http://msdn.microsoft.com/en-us/library/ms190273.aspx

NoteNote

Dropping a column does not reclaim the disk space of the column. You may have to reclaim the disk space of a dropped column when the row size of a table is near, or has exceeded, its limit. Reclaim space by creating a clustered index on the table or rebuilding an existing clustered index by using ALTER INDEX.

Even after the ALTER TABLE … DROP COLUMN <<NTEXT>> the PHYSICAL layout of the data looks like the following.  Double the storage in the database but the meta data has been updated so to the outside, query world, the column no longer exists. 

image

Now the customer issues an index rebuild and it is taking a long time for the 1.2 billion rows and SQL Server is not providing detailed progress information, just that the index creation is in-progress.

The early phase of the index creation is cleaning up the rows, in preparation for the index build.  In this scenario the SQL Server is walking over each data row, following and de-allocating the specific row’s text chain.  Remember the text pages could be shared so each chain has to be cleaned up, one at a time.  This cleanup is a serial (not parallel) operation! For the customer this means running 1.2 billion TEXT chains and doing the allocation cleanup on a single thread.

What would have been better is to update the NTEXT column to NULL before issuing the DROP COLUMN statement.   The update could have run in parallel to remove the text allocations.  Then when the index creation was issued the only cleanup involved is the single (NULL BIT) in the row data. 

Note:  With any operation this large it is recommended you do it in row batches so if a failure is encountered it only impacts a limited set of the data.

Specifically, if you issue the ALTER TABLE from SQL Server Management Studio, query window on a remote client and this remote client was rebooted;  the query is canceled (ROLLED BACK).  After running for 24+ hours this also is a lengthy rollback lesson to learn.

DBCC CLEANTABLE  - http://msdn.microsoft.com/en-us/library/ms174418.aspx 

The cleantable command is another way to help cleanup dropped, variable length columns and allows you to indicate a batch size as well.

However, this command is run in serial (not parallel) as well and just like the create index a stability lock on the object is acquired and held during the processing.  Again, it may be better to issue a update to NULL and then cleantable.

Bob Dorr - Principal SQL Server Escalation Engineer

FileNotFoundException with Microsoft.AnalysisServices.Xmla

0
0

I ran across two cases that were hitting the following Exception within SharePoint trying to run the PowerPivot Management Portal.  This was located in the SharePoint ULS Log:

EXCEPTION: System.IO.FileNotFoundException: Could not load file or assembly 'Microsoft.AnalysisServices.Xmla, Version=11.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91' or one of its dependencies. The system cannot find the file specified.  File name: 'Microsoft.AnalysisServices.Xmla, Version=11.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91'     at Microsoft.AnalysisServices.SharePoint.Integration.Redirector.WCFTransport.Init()     at Microsoft.AnalysisServices.SharePoint.Integration.Redirector.WCFTransport.Connect(String dataSource, String applicationName, String& errorMsg)

The environment was setup in a similar way to the following – 3 SharePoint Services consisting on one WFE and two App Servers.

image

On the second App Server, PowerPivot was not installed, only Excel Calculation Services was configured (ECS).  The issue was originally intermittent.  To narrow this down, we turned off ECS on the App Server that had PowerPivot on it.  We were then able to hit the error every time.

Doing some searching on the error above provided some hits, but nothing that really seemed useful.  I then refocused on just trying to find out where the Microsoft.AnalysisServices.Xmla module comes from, as it didn’t appear to be installed on the PowerPivot server either – so it isn’t part of the base install.  This lead me to the following MSDN Documentation within Books Online:

Install the Analysis Services OLE DB Provider on SharePoint Servers
http://msdn.microsoft.com/en-us/library/ee210608.aspx

This included the following blurb under “Why you need to install the OLE DB Provider”:

The second scenario is when you have a server in a SharePoint farm that runs Excel Services, but not PowerPivot for SharePoint. In this case, the application server that runs Excel Services must be manually updated to use both the newer version of the provider, as well as install an instance of the Microsoft.AnalysisServices.Xmla.dll file in the global assembly. These components are necessary for connecting to a PowerPivot for SharePoint instance. If Excel Services is using an older version of the provider, the connection request will fail.

It’s always nice when I can find actual documentation that clearly states what is needed.  This at least met the scenario I was hitting and it indicated I had to have this installed.  Otherwise you get the error I was hitting.  However, that didn’t answer the question of where it comes from.  A little further down, I saw the following under “Install the SQL Server 2012 OLE DB Provider on a standalone Excel Services server”:

On the Feature Selection page, choose Client Tools Connectivity.

Click Management Tools - Complete. This option installs Microsoft.AnalysisServices.Xmla.dll.

For those not necessarily familiar with the SQL Product as a whole and are more on the SharePoint side of things, the Management Tools option is essentially SQL Server Management Studio and comes from the actual SQL Product Setup as a Shared Feature. 

image

So, to get this, you would have to run SQL Setup and you only need to select this option.  You don’t actually need to install a full Instance of SQL.

So, for your environment, if you have a SharePoint App Server that only has ECS on it, and others that have PowerPivot 2012, you will need to install Management Studio on the ECS only boxes so that you can get the Microsoft.AnalysisServices.Xmla.dll assembly, along with following the other instructions in the document referenced above.  Of note, it does not appear that this was required with PowerPivot 2008 R2.

 

**** Update – 12/6/12 ****

I was also made aware that another option could be to run the sppowerpivot.msi that comes with Service Pack 1 for SQL 2012.  This is the PowerPivot add-in for SharePoint 2013.  This install will provide you some options and make selections.  You do not need to select the PowerPivot option.

image

The selection above will also install the SP1 build of Microsoft.AnalysisServices.Xmla.

image

Within the SP1 output, it will be in the 1033_enu_lp\<platform>\setup

image

You can also download the PowerPivot Add-in For SharePoint 2013 through the SQL 2012 SP1 Feature Pack.  The file is 1033\x64\spPowerPivot.msi.

Adam W. Saxton | Microsoft Escalation Services
http://twitter.com/awsaxton

Viewing all 339 articles
Browse latest View live




Latest Images