Quantcast
Channel: CSS SQL Server Engineers
Viewing all 339 articles
Browse latest View live

Having performance issues with table variables? SQL Server 2012 SP2 can help!

0
0

In a previous blog, I talked about how table variable can impact performance of your query. The reason is that the statements referencing the table variable are compiled when the table variable has no data in it. Therefore, cardinality estimate for the table variable is always 1. If you always insert small number of rows into the table variable, it may not matter. But if you insert large number of rows into the table variable, the query plan generated (based on 1 row assumption) may not be that efficient.
As part of supportability improvement, SQL Server 2012 Service Pack 2 made an improvement. This improvement will help in situations where you have large number of rows inserted into a table variable which joins with other tables. A new trace flag 2453 is introduced to activate this improvement. When SQL Server detects enough rows inserted into the table variable, it will then recompile the subsequent statements referencing the table variable. SQL Server will detect row count of the table variable at the time the statement is recompiled and can produce a more efficient plan. "Enough rows" mentioned is determined by recompile threshold for temp tables in KB http://support.microsoft.com/kb/243586.
This behavior is documented in http://support.microsoft.com/kb/2952444 .
I want to emphasize that trace flag 2453 must be used in order to activate this feature. If you are on SP2 and experience slow performance on a query using table variable, you can give this trace flag a try to see if it helps.
Let's use the same demo which I used in the previous blog to demonstrate the behavior.
First, set up table


dbcc traceoff(2453,-1)
go
dbcc freeproccache
go
set statistics profile off
go
use tempdb
go
if OBJECT_ID ('t2') is not null
drop table t2
go
create table t2 (c2 int)
go
create index ix_t2 on t2(c2)
go
--insert 100,000 rows into the perm table
set nocount on
begin tran
declare @i int
set @i = 0
while @i < 100000
begin
insert into t2 values (@i)
set @i = @i + 1
end
commit tran
go
--update stats
update statistics t2

go

I'm going to use the same query below to show you the estimate difference.
set nocount on
declare @t1 table (c1 int)
begin tran
declare @i int
set @i = 0
while @i < 100000
begin
insert into @t1 values (@i)
set @i = @i + 1
end
commit tran
set statistics profile on
select * from @t1 inner join t2 on c1=c2
go

set statistics profile off


Without the trace flag (2453), the query uses nested loop and the table variable is estimated incorrectly with just one row.


After I enabled the trace flag and flushed plan cache with the following commands, the plan was changed to hash match and the table variable is estimated correctly with 100000 rows.
dbcc freeproccache
go
dbcc traceon(2453,-1)



Jack Li
Senior Escalation Engineer | Microsoft SQL Server Support






VSS backup of AlwaysOn Secondaries

0
0

Hi Everyone,

Today I’m going to highlight one of the changes brought by SQL Server 2012 SP2, which is the way we handle VSS Backup requests on AlwaysOn Secondary Databases.

Until now, any request for a FULL database backup (VSS_BT_FULL) thru VSS against a DB that is an AlwaysOn secondary was failing by design. Our VSS Writer SQLWriter would return FAILED_AT_PREPARE_SNAPSHOT (0x800423f4 - VSS_E_WRITERERROR_NONRETRYABLE).

A copy-only VSS backup (VSS_BT_COPY) would work.

The rationale being the following: a FULL backup is actually updating the target DB (reset of differential bitmap mainly), which is not possible when the DB is read only. Furthermore, because of the failover possibilities introduced by AlwaysOn, the favored option was to use Native SQL Servers backup that could rely on SQL Server variable backup location (http://msdn.microsoft.com/en-us/library/hh245119.aspx) if needed, and be ‘alwayson –aware’.

So that could be the end of the story: against an AlwaysOn Secondary DB, either use Copy_only VSS backups or use native backups.

But of course that wouldn’t make for a very interesting blog post…

Enters HyperV…

Consider the following scenario:

Large Windows HyperV Servers, hosting many HyperV virtual Servers, some of them SQL Servers in Always On architecture.

In short: a Private Cloud.

In this context, the IT usually takes care of the infrastructure at host level, and lets users deal with whatever happens within the VMs. One of the key tasks of IT is to manage backups (eg. for disaster recovery at datacenter level, or to provide restore of single VMs).

And the mainstream way to do that is to take VSS backups of the Host Disk Volumes. Microsoft System Center DPM will do exactly that.

But VSS backups are all about taking backups that are consistent: in ‘standalone’ SQL Server context you may already know all the logic SQLWriter implements to make sure that IO against the Databases that are backed up are frozen during the snapshot operation. So, back to our HyperV context, collecting a point-in-time image of a VHD without bothering with what happens within the VM would be defeating that very purpose right?

So what happens is the following: the VSS backup is propagated to Guest VMs thru HyperV integration services. That propagation hardcodes the backup type to VSS_BT_FULL, and therefore all guest VMs are initiating a VSS backup/snapshot in their own context. The purpose is to make sure that all applications are quiesced within all running VMs at the time we take the snapshot at the host level. This will enable us to generate a consistent backup of running VMs.

But let’s now put this in the context where one of the VMs is running an AlwaysOn secondary DB: you guessed it, it’s not going to work:

clip_image002

The important thing to know here is that the error returned by SQLWriter in VM3 will actually bubble up all the way to the initial VSS backup command at Host level, and will make it fail as a whole.

So we ended up in a situation where the IT infrastructure folks would see their Host backups failing from time to time for an unknown reason, depending on whether one or more of the VM present on the Host Drive being backup up had a secondary AlwaysOn DB! It could be that the AlwaysOn AG spans different HyperV hosts and therefore that the presence of a Secondary DB on a given Host is not something static over time.

Because of the complexity of the whole call chain, and because infrastructure IT operators may not have any visibility (or understanding) of the VM content, you can imagine what kind of troubleshooting challenges this would offer… And even when the situation is understood, well, what do we do? If host level backup must be somehow manually synchronized to the applicative state of Guest VMs, the Private Cloud scenario becomes quite more complicated all of a sudden.

This is the reason why SQL Server 2012 SP2 ships a code change for SQLWriter that will implement the following:

clip_image004

As you can see, SQLWriter now detects this specific situation and changes the backup type to VSS_BT_COPYONLY. This will only happen for VSS_BT_FULL backups against AlwaysOn secondary Dbs. VSS_BT_FULL backups against primary DB will happen without change.

In this case, the VSS backup will now successfully complete in VM3 and the host-level backup success state will no longer be tied to guest VM’s AlwaysOn activity. Private Cloud scenario unlocked!

Important note: the fact that VSS backup of AlwaysOn secondaries now works does not make it the preferred solution to backup SQL Server AlwaysOn architectures. The main purpose of the SP2 change is to avoid a situation where a single SQL Server in a VM fails a complete host-level backup operation that encompassing dozens of VMs.

The resulting backup for the VM hosting SQL should be considered a Disaster Recovery one, where AlwaysOn will be removed at once at restore time, not as a way to rebuild a subset of the nodes for an AlwaysOn Availability group. And for regular databases within the VM, that backup is as good any regular VSS one.

Finally, SQL Server 2012 SP2 only contains a partial fix for this issue. Servers running case-sensitive sort orders will require SQL Server 2012 SP2 Cumulative Update 2.

HTH,

Guillaume Fourrat
SQL Server Escalation Engineer
Microsoft France

Power View in Excel won’t render from SharePoint

0
0

I originally encountered this issue back in May with a customer.  We had another customer this month that had the same issue.  When you try to load an Excel Document with a Power View Report in it from SharePoint, you may encounter the default unable to load power view report image.

image

Before I get into specifics of the issue we had encountered with a few customers, I’d first like to say that you should validate that Silverlight is installed and working properly.  You may be able to do this by trying a standalone Power View (RDLX) Report outside of Excel. If you have check that and that is working, you may want to go through looking at the details presented in this blog post.

Unfortunately, the image above doesn’t really provide any real guidance as to what to do.  So, with any SharePoint issue, I go to the ULS log to see what is there.  Looking through the ULS log, there is error that stands out.  Nothing that shows an Exception of any kind.

08/21/2014 10:25:26.30    w3wp.exe (0x0C2C)    0x0254    Excel Services Application    Excel Calculation Services    ah3c5    Verbose    MossHost.InteractiveReportServiceUrl: Interactive report rervice URL is missing from the farm properties, property name: InteractiveReportServiceUrl    8ab5b09c-93d3-e002-33bc-206ec188ef95

You may or may not see the following entry:

w3wp.exe (0x16B8) 0x26FC Excel Services Application Excel Calculation Services ahgij Medium Not loading interactive reports since not enabled. 15028e9c-ad22-104d-0114-866c6d407dcd

Looking at a working system, we see the following:

08/21/2014 14:52:27.43    w3wp.exe (0x1330)    0x1508    Excel Services Application    Excel Calculation Services    ah3c7    Verbose    MossHost.InteractiveReportServiceUrl: Interactive report service URL: /_layouts/15/ReportServer/AdHocReportDesigner.aspx?ViewMode=Presentation&DeferredInitialization=true&Fit=true&PreviewBar=false&BackgroundColor=White&AllowSectionNavigation=false&EnableKeepAlive=true&AllowEditViewMode=false&AllowFullScreenViewMode=false    d2c4b09c-e336-e002-33bc-2fa5fdb18796

08/21/2014 14:52:27.43    w3wp.exe (0x1330)    0x1508    Excel Services Application    Excel Calculation Services    ah3da    Verbose    MossHost.InteractiveReportApiUrl: Interactive report API URL: /_layouts/15/ReportServer/InteractiveReportAPI.js    d2c4b09c-e336-e002-33bc-2fa5fdb18796

There are two Properties that will get added to the SharePoint Farm Configuration for Power View.

InteractiveReportServiceUrl– This is the URL Pointer for Excel Services to redirect to the Power View page and render the Power View report.  If this isn’t present, then Excel doesn’t know what to do with the Power View Report that is within the Excel Workbook.

InteractiveReportApiUrl– This is a helper Javascript file for use with Power View.

These properties are part of the SharePoint Farm Configuration.  We can view these via SharePoint by looking at the properties of the SPFarm from PowerShell.

image

You can also see these via SQL via querying the SharePoint Config Database.

SELECT *
FROM [SharePoint_Config].[dbo].[Objects]
Where Properties like '%InteractiveReportServiceUrl%'

image

You won’t get a result back if the values are missing from the farm.  This is a quick way to tell if they are missing.  The Properties output in my farm looks like the following, when those values are present. 

<object type="Microsoft.SharePoint.Administration.SPFarm, Microsoft.SharePoint, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c"><sFld type="Int32" name="m_PersistedFileChunkSize">4194304</sFld><sFld type="Int32" name="m_XsltTransformTimeOut">1</sFld><sFld type="Int32" name="m_cPasswordChangeGuardTime">45</sFld><sFld type="Int32" name="m_cPasswordChangeMaxTries">5</sFld><fld name="m_PasswordChangeEmailAddress" type="null" /><sFld type="Int32" name="m_cDaysBeforePasswordExpirationToSendEmail">10</sFld><sFld type="Boolean" name="m_bUseMinWidthForHtmlPicker">False</sFld><fld name="m_EncodedFarmId" type="null" /><fld type="System.Collections.Generic.HashSet`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], System.Core, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" name="m_serverDebugFlags" /><fld name="m_AuthenticationRealm" type="null" /><sFld type="Boolean" name="m_userLicensingEnabled">False</sFld><fld type="System.Collections.Generic.Dictionary`2[[System.Guid, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.Version, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" name="m_Versions"><sFld type="Guid">3a9fdacb-c088-420d-9670-043c1417f6f2</sFld><sFld type="Version">14.0.2.0</sFld><sFld type="Guid">66d56e1c-50b2-482e-af1a-7dd7ba0b72cc</sFld><sFld type="Version">15.0.0.0</sFld><sFld type="Guid">00000000-0000-0000-0000-000000000000</sFld><sFld type="Version">15.0.4617.1000</sFld><sFld type="Guid">77e7f90e-1989-46c2-ad65-361a53dcb2e0</sFld><sFld type="Version">15.0.1.0</sFld><sFld type="Guid">54d00007-0f81-42b1-8f06-fb9b981a617d</sFld><sFld type="Version">14.0.1.0</sFld><sFld type="Guid">6ac833ea-3f8d-46b6-8b30-92ac4553a742</sFld><sFld type="Version">15.0.1.0</sFld><sFld type="Guid">6371575d-8eae-41dd-903f-b9fbc2da7aad</sFld><sFld type="Version">15.0.1.0</sFld><sFld type="Guid">c8a0b463-1852-4f3b-8fd3-216c4d19585a</sFld><sFld type="Version">15.0.1.0</sFld><sFld type="Guid">42c6e513-ad52-4d28-93d6-d07d1afd7b14</sFld><sFld type="Version">15.0.2.0</sFld></fld><fld name="m_UpgradeContext" type="null" /><fld type="System.Collections.Hashtable, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" name="m_UpgradedPersistedFields" /><fld type="System.Collections.Hashtable, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" name="m_Properties"><sFld type="String">WopiLicensing</sFld><sFld type="String">HostBIEnabled</sFld><sFld type="String">InteractiveReportApiUrl</sFld><sFld type="String">/_layouts/15/ReportServer/InteractiveReportAPI.js</sFld><sFld type="String">DisableIntranetCallsFromApps</sFld><sFld type="Boolean">True</sFld><sFld type="String">DisableIntranetCalls</sFld><sFld type="Boolean">True</sFld><sFld type="String">InteractiveReportServiceUrl</sFld><sFld type="String">/_layouts/15/ReportServer/AdHocReportDesigner.aspx?ViewMode=Presentation&amp;DeferredInitialization=true&amp;Fit=true&amp;PreviewBar=false&amp;BackgroundColor=White&amp;AllowSectionNavigation=false&amp;EnableKeepAlive=true&amp;AllowEditViewMode=false&amp;AllowFullScreenViewMode=false</sFld><sFld type="String">GuestSharingEnabled</sFld><sFld type="Boolean">True</sFld></fld><sFld type="String" name="m_LastUpdatedUser">BATTLESTAR\asaxton</sFld><sFld type="String" name="m_LastUpdatedProcess">psconfigui (3468)</sFld><sFld type="String" name="m_LastUpdatedMachine">ADMADAMA</sFld><sFld type="DateTime" name="m_LastUpdatedTime">2014-06-20T14:53:36</sFld></object>

So, what do we do if they are missing?  I was able to correct it by adding them back via PowerShell. Here are the PowerShell Commands for SharePoint 2013.

# Get the SPFarm object and list out the existing properties to verify
# the ones we are looking for are missing.

$farm = Get-SPFarm
$farm.Properties

# Set the Property Name and Value to add - InteractiveReportServiceUrl
$propName = "InteractiveReportServiceUrl"
$propValue = "/_layouts/15/ReportServer/AdHocReportDesigner.aspx?ViewMode=Presentation&DeferredInitialization=true&Fit=true&PreviewBar=false&BackgroundColor=White&AllowSectionNavigation=false&EnableKeepAlive=true&AllowEditViewMode=false&AllowFullScreenViewMode=false"

# Add the InteractiveReportServiceUrl Property to the Server Farm
$farm.Properties.Add($propName, $propValue);

# Set the Property Name and Value to add - InteractiveReportApiUrl
$propName = "InteractiveReportApiUrl"
$propValue = "/_layouts/15/ReportServer/InteractiveReportAPI.js"

# Add the InteractiveReportApiUrl Property to the Server Farm
$farm.Properties.Add($propName, $propValue);

# Propagate the changes back to the Database
# true to silently reinitialize the object with an existing object's data if the object already exists in the configuration store;
# false to throw an exception if the object already exists.

$farm.Update($FALSE)

If the properties are there and are blank, or have the wrong value, you can do the following to update the existing properties.

$farm.Properties[$propName] = $propValue; 

instead of

$farm.Properties.Add($propName, $propValue);

Once that is done, you’ll want to do an IISRESET on your servers for the Config information to get loaded.  Then try your Power View Report again and it should come up.

How does the configuration get into this state?  I don’t know for sure, but my thoughts are that it has to do with the order in which the Service Apps were installed.  If the RS Service was configured before the Excel Calculation Service, we may get into this state.  Regardless, the commands above should get it back into working status.

 

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

A Partitioned Table May Limit the Runtime MAX DOP of Create/Alter Index

0
0

I was working with a 1.3 trillion row table in the Microsoft lab when I learned more about the ins and outs of this behavior.  This issue is alluded to in SQL Server Books Online but allow me to expand on the behavior a bit more. (http://msdn.microsoft.com/en-us/library/ms190787.aspx)

The lab machine is a 128GB, 64 CPU system running enterprise editions of Windows and SQL Server.  

  • When I built an index on a 25 million row table, non-partitioned the MAX DOP selected for the index build was 64.   
  • When I built an index on the 1.3 trillion row table, partitioned (27 total partitions) the MAX DOP selected for the index build was 27.

I spent some time looking at and tuning the maximum query grant and sp_configure index memory settings without seeing any change in MAX DOP = 27 behavior.

After reading over SQL Server Books Online and stepping through the SQL Server (CalculateDOP) logic the answer was clear.

There are certain operations against a partitioned table (create and alter index are a few of these) that leverage the partitioning when performing range operations.   The partition ranges are then used to drive the maximum possible, runtime DOP level.

First, SQL Server limits the maximum possible DOP = MIN(64, CPUs).  If you only have 32 CPUs the MAX DOP possible will be 32.  If you have 160 CPUs the MAX DOP possible will be 64.

Then for some operations, such as create index, the partitions are considered.   When performing a create/alter index with range partitioning: MIN(Partitions, MIN(64, CPUS)).

Note: The formulas presented here assume sp_configure max degree of parallelism=0 and no other resource governor or query option was established.   You may use the MAXDOP query or resource governor options to alter the runtime DOP selections.

In my test case the I had 64 CPUs so the MIN becomes the partitions = 27.   This is a very practical choice in many situations as the partitions usually line up with hardware and running DOP = partitions in this way is very efficient and inline with the DBAs database design decisions.

The specific index build I was doing was very CPU intensive (Spatial) activities and from testing I knew if I achieved additional runtime DOP I could build the index faster (knowing I consume more resources at the possible expense of other activity!)

Evenly splitting the partitioned table into at least as many partitions as I have MIN(64, CPUS) CPU resources allowed me to apply more CPU resources to the create index operation. 

In my specific scenario the 1.3 trillion row, spatial index builds in ~4.5 hours @ 27 CPUs and ~2.3 hours @ 64 CPUs. 

WARNING: Increasing the runtime DOP does not always provide improved performance.  The additional overhead may put pressure on memory, I/O and impact performance of other queries as the additional resources are consumed.    You should test carefully but consider your partitioned layout in order to optimize your DOP capabilities.

Specific Partition Index Builds

You should also be aware that the partition scheme and index may limit the MAXDOP when rebuilding a specific index on a specific partition.   For some indexes you are allowed to rebuild a partitioned index for a specific partition.   This may use the partition’s range and limit the index rebuild to MAXDOP=1 where a similar index build on a non-partitioned table may use an increased DOP level.

Bob Dorr - Principal SQL Server Escalation Engineer

SQL CLR assembly fails verification with “Unable to resolve token”

0
0

Recently we worked with a customer has an SQL CLR assembly. This customer decided to upgrade from SQL Server 2008 R2 to SQL Server 2012. But this assembly failed to register with SQL Server and he received the following error:
Msg 6218, Level 16, State 2, Line 11
CREATE ASSEMBLY for assembly 'test3.5' failed because assembly 'test3.5' failed verification. Check if the referenced assemblies are up-to-date and trusted (for external_access or unsafe) to execute in the database. CLR Verifier error messages if any will follow this message [ : Test.Test::my_func][mdToken=0x6000001][offset 0x00000000] Unable to resolve token.

First of all, SQL Server 2008 R2 and SQL 2012 use different versions of CLR. SQL 2008 R2 and below uses CLR 2.0/3.5 but SQL 2012 was upgraded to use CLR 4.0 and above.
What's interesting for this customer is that if they compile the assembly using 4.0 compiler, then they could register the assembly by using CREATE ASSEMBLY.
When we compared the IL generated, there is just one difference dealing with a local variable.
For the assembly compiled for 2.0/3.5, you see ".locals init ([0] void& pinned pData)". But for the assembly compiled for 4.0, you see ".locals init ([0] native int& pinned pData)". See a screenshot below with ildasm:
IL generated by 2.0 compiler


IL generated by 4.0 compiler



The IL in question is generated for the code like fixed (void* pData = &buf[1024]). Basically, the intention is to pin the memory for native call.

Cause

There are two changes in CLR that cause CREATE ASSEMBLY to fail. First, CLR 4.0 compiler no longer generate IL "void& pinned" for code like fixed (void* pData = &buf[1024]). Instead, it generates IL like .locals init ([0] native int& pinned pData). Additionally, CLR 4.0 peverify code is updated and no longer recognize that particular IL generated by CLR 2.0 compiler. When you CREATE ASSEMBLY in SQL Server, it has to do peverify to ensure the assembly passes verification. In this case, SQL 2012 uses 4.0 peverify code to verify the assembly compiled with 2.0 compiler. Therefore, it fails.

Solution

There are two solutions for this.
First option is to compile your assembly using CLR 4.0 compiler targeting 4.0 framework. This should be the best option because SQL 2012 uses CLR 4.0.
If you need your assembly to continue to target 2.0/3.5 framework, you can use 4.0 compiler but link 2.0 version of mscorlib.dll. Here is an example command.
C:\Windows\Microsoft.NET\Framework\v4.0.30319\csc.exe /nostdlib+ /noconfig /r:c:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorlib.dll -unsafe -optimize -debug:pdbonly -target:library -out:test.dll test.cs

Repro

Step 1 Save the following code into test.cs

using System;
using System.Collections;
using System.Runtime.InteropServices;
using System.Text;
using System.Reflection;
using System.Reflection.Emit;
using Microsoft.Win32;
namespace Test
{
unsafepublicclassTest
{
unsafepublicdelegatevoid my_Delegate(ushort comp,
ushort func,
void* ptr,
uint length);
publicstatic my_Delegate delegate1;

uint dataLength = 0;
publicvoid my_func (String objectId,
uint component,
uint method,
ushort level,
String caption,
uint offset,
int length,
byte[] buf)
{
fixed (void* pData = &buf[1024])
{

delegate1((ushort)component,
(ushort)method,
pData,
dataLength);
}
}
}
}

Step 2 Compile the assembly

Compile using the following command
C:\Windows\Microsoft.NET\Framework\v3.5\csc.exe -unsafe -optimize -debug:pdbonly -target:library -out:test3.5.dll test.cs

Step 3: CREATE ASSEMBLY

If you "create assembly asem from 'C:\repro2\test3.5.dll' with permission_set=unsafe", you will receive the above error.

Step 4: solution and workaround

But the following two commands won't result in errors
C:\Windows\Microsoft.NET\Framework\v4.0.30319\csc.exe -unsafe -optimize -debug:pdbonly -target:library -out:test4.0.dll test.cs
C:\Windows\Microsoft.NET\Framework\v4.0.30319\csc.exe /nostdlib+ /noconfig /r:c:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorlib.dll -unsafe -optimize -debug:pdbonly -target:library -out:test.dll test.cs

Jack Li | Senior Escalation Engineer | Microsoft SQL Server Support

SQL Server MAX DOP Beyond 64 – Is That Possible?

0
0

I recently posted a blog outlining how the partitions of a table can be used in the calculation for the achievable max degree of parallelism (MAX DOP). http://blogs.msdn.com/b/psssql/archive/2014/09/04/a-partitioned-table-may-limit-the-runtime-max-dop-of-create-alter-index.aspx 

Discussing this with various peers I uncovered a perception that SQL Server was always limited to a max of 64 CPUs, even if the machine had more (128, 160, …)   This is not the case, instead the perception is semantic driven and once you realize how to leverage it maintenance operations can take advantage of more than 64 CPUs.

It is not hard to understand how the perception started or continues to propagate itself.

SQL Server Books Online states:“Setting maximum degree of parallelism to 0 allows SQL Server to use all the available processors up to 64 processors. “ and that is where most of us quit reading and assume the MAX DOP for SQL Server is limited to 64.

Instead if you read a bit further:“If a value greater than the number of available processors is specified, the actual number of available processors is used.”

Simply stated if you tell SQL Server to use more than 64 CPUs SQL Server will attempt to do just that.

Bob Dorr - Principal SQL Server Escalation Engineer

A faster CHECKDB – Part III

0
0

Bob Ward introduced Part 1 and Part 2 of ‘A faster CHECKDB’ as highlighted in the following links.

Part 1:http://blogs.msdn.com/b/psssql/archive/2011/12/20/a-faster-checkdb-part-i.aspx 
Part 2:
http://blogs.msdn.com/b/psssql/archive/2012/02/23/a-faster-checkdb-part-ii.aspx 

Recently,  Jonathan pointed out a memory grant issue in the following post.

https://www.sqlskills.com/blogs/jonathan/dbcc-checkdb-execution-memory-grants-not-quite-what-you-expect/

I always enjoy my interactions with Jonathan and this is yet another positive experience for us all.  After digging into this I found there is a bug and it was corrected in the SQL Server 2014 release.

The heart of the matter is a cardinality problem for the estimated number of fact rows.  The cardinality estimation drives a large portion of the memory grant size calculation for the DBCC check commands.  As Jonathan outlines in his post the overestimate is often unnecessary and reduces the overall performance of the DBCC check operation.

The checkdb/checktable component responsible for returning the number of fact rows (cardinality) for each object mistakenly returned the size of the object as the number of rows.

The following example shows 10,000 rows, requiring 182,000 bytes on disk.

image

Prior to SQL Server 2014 the SQL Server code would return a cardinality estimate based on 182,000 instead of 10,000.  As you can easily see this is an significant, row estimate variance.

If you capture the query_post_execution_showplan (or pre) you can see the checkindex plan used by the DBCC check operation.

clip_image002


Shown in the table are plan excerpts from SQL Server 2012 and SQL Server 2014, using an EMPTY, table.  Notice the estimate is near 2 pages in size (8192 * 2) and for an empty table SQL Server only produces 3 total facts related to allocation state.

SQL 2012

<StmtSimpleStatementEstRows="129.507"StatementOptmLevel="FULL"

          <QueryPlanDegreeOfParallelism="0"MemoryGrant="33512"NonParallelPlanReason="MaxDOPSetToOne"CachedPlanSize="24"CompileTime="0"CompileCPU="0"CompileMemory="128">

  <RelOpNodeId="1"PhysicalOp="Sort"LogicalOp="Sort"EstimateRows="16772             

       <RunTimeInformation>

                       < RunTimeCountersPerThreadThread="0"ActualRows="3"ActualEndOfScans="1"ActualExecutions="1" />

SQL 2014

<StmtSimpleStatementEstRows="10"StatementOptmLevel="FULL"

          <QueryPlanDegreeOfParallelism="0"MemoryGrant="1024"NonParallelPlanReason="MaxDOPSetToOne"CachedPlanSize="24"CompileTime="0"CompileCPU="0"CompileMemory="128">

  <RelOpNodeId="1"PhysicalOp="Sort"LogicalOp="Sort"EstimateRows="9"             

       <RunTimeInformation>

                       < RunTimeCountersPerThreadThread="0"ActualRows="3"ActualEndOfScans="1"ActualExecutions="1" />


A more dramatic difference is shown from a test I ran against a 1.3 trillion row table, without the fix.  The estimated rows are 900 trillion with a memory grant size of 90GB.


Prior to SQL Server 2014 you can leverage Jonathan’s advice and limit the DBCC check using Resource Governor or move to SQL Server 2014 to execute your DBCC check operations faster.

Bob Dorr - Principal SQL Server Escalation Engineer

How come sys.dm_exec_requests.cpu_time never moves?

0
0

Today, I want to point out another SQL Server 2012 SP2 fix that may affect your performance troubleshooting. When you are troubleshooting a long running query, chances are you will use sys.dm_exec_requests to look for progresses of the query. But one of the column – cpu_time is not accurate prior to SQL Server 2012 SP2.
When a query runs in parallel, main thread just coordinates things and it's the child threads that do meaningful work. Prior to SP2, the sys.dm_exec_requests.cpu_time doesn't roll up child threads' cpu while the query is in progress. At end of the query, the CPU times spent will be all rolled up though.

To Illustrate the problem, I came up with a query that can generate parallel plan and do testing on a SQL 2012 SP1 and SP2 instances.

First, I ran the query on SP1 instances. In order to see if the query is run in parallel, I used query the following DMVs sys.dm_exec_requests and sys.dm_os_tasks. I could see that the spid 51 had 55 tasks.
select t.session_id, count (*) 'number of tasks'
from sys.dm_exec_requests r join sys.dm_os_tasks t on r.session_id =t.session_id
group by t.session_id
having count (*) > 1




But after 1 minutes into running, the cpu_time still showed 0.



When I forced a serial plan, I see the cpu_time increased correctly as time went by.



Now let's test the same query in SP2 which fixed the problem. Again, I verified that the query has 54 tasks associated with it using the same DMV query above.


Query "select cpu_time, * from sys.dm_exec_requests where session_id = 52" shows that cpu_time kept increasing as time went by.


Jack Li |Senior Escalation Engineer | Microsoft SQL Server Support


How It Works: sp_server_diagnostics – spinlock backoffs

0
0

There are numerous articles outlining how spinlocks work so I won’t cover the details in this post.   Instead, I want to focus on the spinlockbackoffs value recorded in the sp_server_diagnostics output.

Component= System

<system spinlockBackoffs="0" sickSpinlockType="none" sickSpinlockTypeAfterAv="none" …

Querying select*fromsys.dm_os_spinlock_stats the backoffs column is presented.   This is NOT the same as the spinlockBackoffs presented in the sp_server_diagnostics output.

A spinlock backoff is only counted in sp_server_diagnostics when the spinlock has been declared SICK.   Sick is the term used to indicate that the code has attempted to acquire the spinlock ownership but after 10,000s of spins and lightweight backoffs, for approx. 5 seconds, ownership could not be acquired.   At the ~5 second point the code performs a more agressive sleep operation because the spinlock appears to be damaged or hung up in some way, or is sick, if you will.

The XEvent spinlock_backoff maps to the dm_os_spinlock_stats, backoff column.

The XEvent spinlock_backoff_warning maps to the sp_server_diagnostics output and is produced when the spinlock is declared sick.

The System component can report ERROR state once the spinlockBackoffs reach a count of 2 or greater for the same sickSpinlockType.  This indicates that for approx., 10 or 15 seconds the spinlock could not acquire ownership, signaling a larger issue on the system.  (Orphaned spinlock, CPU problems, etc…)

Bob Dorr - Principal SQL Server Escalation Engineer

JDBC: This driver is not configured for integrated authentication

0
0

I’ve had about 4 cases in the last two months that centered around the following error when trying to use Windows Integrated authentication with JDBC.

java.sql.SQLException: This driver is not configured for integrated authentication

The key environment point on this was that they were trying to do this on a Linux platorm and not a Windows platform.  Specifically they were running WebSphere on a Linux platform.  The last one I worked on was running WebSphere 8.5.

There is only one location within the JDBC Driver where this particular error is raised.  It is when we are trying to use Kerberos and the Authentication Scheme is set to NativeAuthentication, which is the default setting for this property. Starting in the JDBC 4.0 driver, you can use the authenticationScheme connection property to indicate how you want to use Kerberos to connect to SQL.  There are two settings here.

NativeAuthentication (default) – This uses the sqljdbc_auth.dll and is specific to the Windows platform.  This was the only option prior to the JDBC 4.0 driver.

JavaKerberos– Makes use of the Java API’s to invoke kerberos and does not rely on the Windows Platform.  This is java specific and not bound to the underlying operating system, so this can be used on both Windows and Linux platforms.

So, if you are receiving the error above, there are three possibilities that could be causing it to show up.  First, you actually specified authenticationScheme=NativeAuthentication in your connection string and you are on a Linux Platform.  Second, you specified integratedSecurity and omitted authenticationScheme, which defaulted to NativeAuthentication, and you are on a Unix/Linux Platform.  Third, you are using a version of the JDBC Driver prior to the 4.0 driver and trying to use Integrated Authentication on a Unix/Linux platform.  In the third case, even if you specify authenticationScheme=JavaKerberos, it won’t help as the older drivers aren’t aware of it, so it is ignored.

The following document outlines how to use Kerberos with the JDBC Driver and walks through what is needed to get JavaKerberos working properly.

Using Kerberos Integrated Authentication to Connect to SQL Server
http://msdn.microsoft.com/en-us/library/gg558122%28v=sql.110%29.aspx

Another aspect that was discovered was the that it appears that the WebSphere 8.5 release comes with the 3.0 version of the SQL JDBC Driver.  This will not honor the JavaKerberos setting and you will get the error listed above. 

Configuration

So, you will need to make sure your driver is updated to the 4.0 driver or later.  After that is done, you will need to make sure that the Kerberos Configuration file (krb5.ini or krb5.conf) is configured properly on your platform.  In the above referenced documentation we have a sample of what that should look like.  You will also need to generate keytab files for the platform to reference.  A login configuration file also needs to be setup.  If you don’t have one, the driver will automatically configure it using the Krb5LoginModule.  If you need to use a different Login Module, you will need to make sure that is configured for your environment.  Assuming all of that is in place, the driver should work using JavaKerberos to connect. 

The following blog does a good job of walking through the steps to get this setup for Java.  It indicates Weblogic, but really it just goes through the java aspects.  It walks through how to create the keytab files and what to do with the krb5.ini file.

Configure Kerberos with Weblogic Server (really just a Java reference)
https://blogbypuneeth.wordpress.com/configure-kerberos-with-weblogic-server/

Known Limitation

If you have a multiple domain environment with SQL Servers in different domains that you are trying to hit, you will run into issues.  We found that in order to get it to work properly, you need to set the default domain within the Kerberos configuration file, to the domain that the SQL Server resides in.  You can only have one default domain, so if you have multiple SQL Servers in different domains, you are going to have to pick one. 

SQL JDBC Driver Versioning and Files

I’ve also heard a lot of questions and seen confusion on the file versioning, file name and system requirements.  Here is a table where I tried to highlight what comes with what driver for reference.

JDBC Driver Version

JAR Files

JDBC API Support

Supported JVM

2.0

sqljdbc.jar

3.0

1.5

sqljdbc4.jar

4.0

1.6 or later

3.0

sqljdbc.jar

3.0

1.5

sqljdbc4.jar

4.0

1.6 or later

4.0

sqljdbc.jar

3.0

1.5

sqljdbc4.jar

4.0

1.6 or later

4.1

sqljdbc.jar

3.0

1.5

sqljdbc4.jar

4.0

1.6 or later

sqljdbc41.jar

4.0

1.7 or later

Also, we have documentation regarding the System Requirements that you can look at that goes a little further into this.

System Requirements for the JDBC Driver
http://msdn.microsoft.com/en-us/library/ms378422(v=sql.110).aspx

Hopefully this will help clear things up for you when using the SQL JDBC Driver on a Unix/Linux Platform.

 

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

A faster CHECKDB – Part IV (SQL CLR UDTs)

0
0

I have been working on the various aspects of DBCC performance and SQL CLR based User Defined Data Types.    I encountered a few issues that I have outlined below. 

1.      Memory Grant Bug

There is a bug, prior to SQL Server 2014, causing the memory grant for the DBCC operations (checktable or checkdb per table) to be improperly estimated.  Placing the session, running the DBCC command, in a specific resource workload group allows you to limit the memory grant size and increase the DBCC performance.  Reference the following link for more details” http://blogs.msdn.com/b/psssql/archive/2014/11/10/a-faster-checkdb-part-iii.aspx   I used a grant cap of 20GB on a 128 CPU, 512GB system with success rather than accepting the 90GB default grant request.

2.      Blob Handle Factory

Fix released (Microsoft Bug Id: 3939015).  Before the fix the DBCC command(s) created an internal structure (Blob Handles for each SQL CLR UDT based column as rows were processed) and failed to mark it for reuse when done processing the row.  Each spatial row/column would look over the blob handle list, create a new entry and add it to the list.   This resulted in wasted memory and CPU as the list continued to get larger and larger and the entries could not be reused.    The fix allows proper reuse of the BHF.   

The customer indicated it took 22 days to complete on the current SQL Server 2012 build.   The QFE now completes DBCC checkdb in 15.5 hours. (DL980 G7 with 512GB RAM)

Download:
http://support.microsoft.com/kb/3007556/en-us  (SQL Server 2012 SP2 – CU4)
Article:
http://support.microsoft.com/kb/3029825

3.      Parallelism

DBCC does use parallel, internal queries when possible.   In studying this closer I found that running individual checktable(s), from multiple sessions can decrease the overall DBCC maintenance window.

DBCC checkdb loops over each table and executes the fact queries.  The SQL engine many elect to execute the fact queries in parallel.    However, CheckDB only processes a single table at a time.   On a larger system you may be able to take advantage of multiple checktable invocations on different schedulers.

The fact queries used by DBCC command are prevented from using parallelism if a large UDT column is present.   Spatial is somewhat analogous to varbinary(max) and falls into this limitation.   This means the internal, DBCC, fact query against your largest table runs with a serial plan. 

If you manually shard the table, creating a covering view it may allow you to execute concurrent dbcc, index rebuild and other operations faster.   

Parallel Check Table Example:

DBCC CHECKALLOC
DBCC CHECKCATALOG
Parallel.ForEach(table in sys.tables)       
--         Make sure the number of parallel executions is appropriate for resource consumption on the SQL Server
{
      DBCC CHECKTABLE(…) with …
}

Shard Example:

create view vwMyTable
as

    select <<column list>> from MyTable_1
       union all
    select <<column list>> from MyTable_2   …..

4.      Trace Flag 2566  (Ignore Data Purity Checks)

Once data purity checks have been successful on upgraded databases or the database was created on newer versions the DATA_PURITY check becomes ‘On By Default.’      DBCC DBINFO(<<DBNAME>>) shows the dbi_dbccFlags.   The 0x2 bit indicates data purity will be run by default when checkdb or checktable is executed.     You can avoid these checks using –T2566 or dbcc traceon(2566) in the same batch as checkdb or checktable execution.

The trace flag skips the purity checks as long as the checkdb or checktable command does not specify WITH DATA_PURITY, overriding the trace flag behavior.

My testing only shows a nominal performance gain.

Additional DBCC References

Part 1:http://blogs.msdn.com/b/psssql/archive/2011/12/20/a-faster-checkdb-part-i.aspx 
Part 2:
http://blogs.msdn.com/b/psssql/archive/2012/02/23/a-faster-checkdb-part-ii.aspx 
Part 3:http://blogs.msdn.com/b/psssql/archive/2014/11/10/a-faster-checkdb-part-iii.aspx  

Spatial

Over the last 18 months Microsoft addressed various spatial performance issues.  The QFE build, mentioned above, contains the following corrections that you should consider enabling to improve overall spatial performance on the servers.  

In a nutshell:  Apply latest Service Pack and latest CU for SQL Server and in addition, enable STARTUP trace flags (-T8048 and -T6531)

Details:

Reference link

Title

Comments (Warning: mileage will vary based on data pattern)

http://support.microsoft.com/kb/2887888

http://support.microsoft.com/kb/2887899

http://support.microsoft.com/kb/2896720

 

 

http://blogs.msdn.com/b/psssql/archive/2013/11/19/spatial-indexing-from-4-days-to-4-hours.aspx

FIX: Slow performance in SQL Server when you build an index on a spatial data type of a large table in a SQL Server 2012 or SQL Server 2014 instance

Took an index build from 72 hours to 4 hours and requires trace flag –T8048.

http://support.microsoft.com/kb/2786212

FIX: Access violation occurs when you run a spatial query over a linked server in SQL Server 2008 R2 or in SQL Server 2012

 

http://support.microsoft.com/kb/3005300

FIX: High CPU consumption when you use spatial data type and associated methods in SQL Server 2012 or SQL Server 2014

May improve performance by 10% or more for spatial methods.  Requires trace flag –T6531

http://support.microsoft.com/kb/2977271

FIX: Performance improvement for SQL Server Spatial data access in SQL Server 2012

Query that used to take 20+ hours < 2 hours.

Blob Handle QFE   KB: 3029825

 

 

http://social.technet.microsoft.com/wiki/contents/articles/9694.tuning-spatial-point-data-queries-in-sql-server-2012.aspx

Tuning Spatial Point Data Queries in SQL Server 2012

 

Bob Dorr -  Principal SQL Server Escalation Engineer

 

Do I really need to use DTC Transactions?

0
0

It is sometimes common practice to enable Distributed Transaction (DTC) behavior but it can be unnecessary, and adds unwanted overhead.  

DTC has the ability to determine single phase vs two phase commit requirements.  A DTC transaction involves resource managers (RMs) of which SQL Server can be one of them.  If a single resource manager is involved in the transaction there is no need to perform 2-phase commit.   DTC shortcuts the activity and performs a single-phase commit safely.   This reduces the communication between the DTC and RM managers.  However, the overhead of the DTC manager is still involved making the transaction slightly slower than a native TSQL transaction.

Single Phase

The following is a single phase DTC commit example.

begin distributed tran
go

update dbTest.dbo.tblTest set object_id = 100
go

commit tran
go

Notice the trace output does not indicate a prepared state.  This is a direct indication of a single phase commit.

image

Two Phase

The following is a 2-phase commit example.

begin distributed tran
go

update MYREMOTESERVER.dbTest.dbo.tblTest set object_id = 100
go

commit tran
go

The transaction involved the local instance (RM=1) and a remote instance (RM=2).  With 2 RMs involved DTC commits the transaction under full, 2-phase commit protocol.   Notice the prepared state in the trace indicating full, 2-phase commit protocol is being used.

image


You may want to review the DTC transactions executing on your system, looking for prepared state.  If the DTC transactions running on your system are not using 2-phase commit protocol you should consider removing DTC from the transactions in order to improve performance.

Bob Dorr - Principal SQL Server Escalation Engineer

#Error When Rendering Report

0
0

Last week a case was brought to me where the customer was getting a #Error for a field within their report.  The field value was normally a number, but they wanted to change it to something like “1&1”.  That is when they would see the #Error.

I created my own report that reproduces the issue they were having.  Let’s have a look at what this report looks like normally.  We are going to focus on the Holding Prisoners field.

SNAGHTMLcd4ab0

We can see the problem if we change the field in the database from 15 to “1&1”. At the start, the one difference in my report is that it shows blank instead of #Error.  We will get to the #Error though.  Just pretend that the blank is a #Error.  It isn’t really relevant to the actual issue. 

SNAGHTMLd22b39

Whenever we see a #Error, this comes from two things.  Either something is wrong with the data, or something is wrong with the Report (RDL).  More specifically, it is usually an expression issue within the Report. Looking at the report design, we can see that we do indeed have an expression for that field.

SNAGHTMLd45193

The expression is the following:

=Iif(Trim(CStr(Fields!Holding.Value)).Equals("Cell Block 1138"),Fields!Deck.Value,Fields!Prisoners.Value)

All this is really saying is that if the Holding value is equal to “Cell Block 1138” then show the Deck value.  Otherwise we are going to show the Prisoners value. We know that we are going to get the Prisoners value out of this as we changed that and caused the problem.  Also we can see that the Holding value is “Detention Block AA-23”, so the IIF statement will go to the Prisoners value.  So, let’s just change the expression to just show the Prisoners value to rule out anything with this expression.  We can just change it to the following.

=Fields!Prisoners.Value

We still see the #Error.

SNAGHTMLd8c0d4

At this point, we can see that we are pulling straight from the Prisoners field, so there is no other expression on this textbox that should be getting in the way.

SNAGHTMLdce418

Changing the Database value back to a number allows it to show correctly.  If we look at the value in the database that is causing the problem we see the 1&1 value that we placed there.  To validate that it wasn’t the & causing a problem, we can just change the value to the letter a, and we still see #Error.  So, it is just not liking a string.  There is also no formatting going on from a textbox perspective.  It is just set to Default.

SNAGHTMLdf81e5

If we look at the Table definition, within SQL, we can see that the field’s datatype is a Varchar(50).  So, the Database side is fine and will accept a string.  Otherwise we would have gotten an error when trying to put a string value in that field.

SNAGHTMLe1ca9c

If we look at the actual Dataset in the Report, and look at the fields, we’ll actually see two Prisoner fields. 

SNAGHTMLe4aa44

In the customer’s case, they had a lot of fields and the field with the underscore was way down at the bottom.  The field with the underscore is the actual field from the database.  The one without the underscore is what is called a calculated field.  We can see this if we right click on the field and go to properties.  When you go to add a field, you can choose Query Field, which is just a straight field from the database, or Calculated Field which is based on an expression.

SNAGHTMLe61c82

If we look at the expression for Prisoners, we see the following.  We can break this down a little bit.  There are two IIF statements here.

=Iif(IsNothing(Fields!Prisoners_.Value),"N/A",Iif(Fields!Prisoners_.Value=0,"N/A",Fields!Prisoners_.Value))

So, lets try a few things.  First, if we just show the Prisoners_ field, we get the right value regardless of what it is.  It will show strings just fine.

=Fields!Prisoners_.Value

The result of this was “1&1”.  It works! Second, let’s try the inner IIF statement.

=Iif(Fields.Prisoners_.Value=0,”N/A”,Fields!Prisoners_.Value)

The result of this was #Error.  This is our problem child. Then I realized what was happening.  The first part of this is an evaluation of Value=0.  The 0 is a hint to the Expression engine that the value will be numeric.  When trying to compare a number to a string we will get a conversion error. 

As a side note, if you use SQL Server Data Tools (SSDT), instead of Report Builder, to create your report, and go to Preview within SSDT, it will actually give you a hint to this effect in the Errors window of Visual Studio.

Warning    1    [rsRuntimeErrorInExpression] The Value expression for the field ‘Prisoners’ contains an error: Input string was not in a correct format.    d:\src\Personal\RenderingError\RenderingError\DetentionBlock.rdl    0    0   

To give the Expression engine the hint it needs, we can do the following.

=Iif(Fields!Prisoners_.Value=”0”,”N/A”,Fields!Prisoners_.Value)

By putting 0 within double quotes, we tell the Expression engine to treat it like a string instead of a number.  This then produces the desired result of “1&1”.  Combing this with the full two IIF statements succeeds also.  Now let’s go back to the textbox and add in the original expression.

=Iif(Trim(CStr(Fields!Holding.Value)).Equals("Cell Block 1138"),Fields!Deck.Value,Fields!Prisoners.Value)

This also produces the desired result!  The answer here is that we need to enclose the 0 within the Calculated Field with double quotes to get the Expression engine to treat it like a string instead of a number.

 

Adam W. Saxton | Microsoft Business Intelligence Server Escalation Services
http://twitter.com/awsaxton

SQL Server and SSDs – RDORR’s Learning Notes - Part 1

0
0

I am very hesitant to post anything that I don’t have the full details on.  However, with SSD deployments moving so rapidly I thought it might be helpful to share some of my learning's to date.

I make no claims of being an expert in this area.   However, I have been doing research that I found eye opening.  After reading this blog you are likely to have open questions, just as I do.   This blog is intended to help us all ask the right questions when installing and using SSD based solutions.

SSD is not Magically Failure Resistant

I don’t have to go into this in detail but many of you road out the assumptions that 64 bit processors were twice as fast as 32 bit ones.  Having to explain over and over again that the processor speeds are the same and in some instances slower than the 32 bit predecessor, etc.

SSDs have picked up a similar ‘assumption’ that they are more reliant than spinning media.  While there are advantages to SSD they are still susceptible to many of the same failure patterns as spinning media.   

The failure patterns exhibited by SSD are similar to spinning media.  After reading this document (https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf) it is easy to see the need for a caching controller and making sure data is flushed properly.

·        Bit Corruption, Records exhibit random bit errors

·        Flying Writes, Well-formed records end up in the wrong place

·        ShornWrites, Operations are partially done at a level below the expected sector size

·        Metadata Corruption, Metadata in FTL is corrupted

·        Dead Device, Device does not work at all, or mostly does not work

·        Unserializability, Final state of storage does not result from a serializable operation order

Recommendation: Treat the SSD storage as you would spinning media making sure the appropriate safeguards are in place for power failure (I.E. battery backed cache, etc.)

Capacitor Power Up

The power outage testing document points out many interesting issues that might occur and the systems need to protect against.   I specifically found, the need for the capacitor to ‘power up’ thought provoking.  The charging behavior makes those power outages that occur and 10 or 15 seconds later another power flicker occurs very interesting indeed.

512e

Most SSDs report 512 byte sector sizes but use 4K pages inside the 1MB erasure blocks.   Using 512 byte aligned sectors for the SQL Server log device can generate more (R)ead (M)odify (W)rite activities which could contribute to slower performance and drive wear.  

Recommendation: Make sure the caching controller is aware of the correct page size of the SSD(s) and is able to align physical writes with the SSD infrastructure properly.

0xFFFFFFFF

The common view of a newly formatted drive is one holding all zeros.  It is interesting to note that an erased block of an SSD is all 1’s making a raw read of an erased block all F’s.   It is unexpected for a user to read an erased block during normal I/O operations.  However, just last week I reviewed a report that seems to align with this behavior.

Pattern Stamping

A technique we have used in the past is to write a known pattern to the entire drive.  Then as we execute database activity against that same drive we can detect incorrect behavior (stale read / lost write / read of incorrect offset / etc.) when the pattern unexpectedly appears.

This technique does not work well on SSD based drives.   The erasure and (R)ead (M)odify (W)rite (RMW) activities a write destroys the pattern.   The SSD GC activity, wear leveling, proportional/set-aside list blocks and other optimizations tend to cause writes to acquire different physical locations unlike spinning medias sector reuse.

Flying Writes / Incorrect FLT Mapping

Like many of us the flying writes seem more like a servo and head movement problem.  However, in December I worked on a system were the GPT data (sectors) that should be at the start and end of the volume would show up during a read of the database file.   The first part of the database page was all zeros followed by the GPT information as outlined for the GPT in MSDN.    This was occurring without a power outage/cycle and we continue to investigate FLT mapping bug possibilities.

Non-Serialized Writes

As you can imagine non-serialized writes are a database killer.  Breaking the WAL protocol and making it difficult at best to diagnosis how the data transitioned to damaged state.

Firmware

The firmware used in SSD drives tends to be complex when compared to the spinning media counterparts.   Many drives use multiple processing cores to handle incoming requests and garbage collection activities.   Just last week I was made aware of a firmware fix.   The cores shared the same memory area, leading to a race condition corrupting the SQL Server Log File (ldf.)

Recommendation: Make sure you keep the firmware up-to-date on the SSDs in order to avoid known problems.

Read Data Damage / Wear Leveling

The various Garbage collection (GC) algorithms tend to remain proprietary.  However, there are some common, GC approaches that tend to be well known.    One such activity is to help prevent repeated, read data damage.   When reading the same cell repeatedly it is possible the electron activity can leak and cause neighboring cell damage.  The SSDs protect the data with various levels of ECC and other mechanisms.  

One such mechanism relates to wear leveling.   The SSD keeps track of the write and read activity on the SSD.  The SSD GC can determine hot spots or locations wearing faster than other locations.    The GC may determine a block that has been in read only state for a period of time needs to move.   This movement is generally to a block with more wear so the original block can be used for writes.   This helps even the wear on the drive but mechanically places read only data at a location that has more wear and mathematically increases the failure chances, even if slightly.

The reason I point this behavior out is not to specifically recommend anything but to make you aware of the behavior.   Imagine you execute DBCC and it reports and error and you run it a second time and it reports additional or a different pattern of errors.   It would be unlikely but the SSD GC activity could make changes between the DBCC executions.

OS Error 665 / Defragmentation

I have started investigations as to what fragmentation means on an SSD.   In general, there is not much to do with fragmentation on an SSD.  There are some defragmentation and trimming activities that can be of note: http://www.hanselman.com/blog/TheRealAndCompleteStoryDoesWindowsDefragmentYourSSD.aspx

Spinning media needs to keep blocks near one another to reduce the drives head movement and increase performance.  SSDs don’t have the physical head.  In fact, many SSDs are designed to allow parallel operations on different blocks in parallel. 

SSD documentation always indicates that serial activities are the best I/O patterns to maximize I/O throughput.

I was recently able to reproduce OS Error 665 (File System Limitation) on the database file (MDF).  I can cause the same problem using spinning media but it commonly takes far longer to trigger the error.  The scenario is a BCP in of 1TB of data but I started the database at 10GB and only allowed auto-grow in 1MB chunks.   When I reached ~480GB I started encountering the OS Error 665.

Using utilities such as FSUTIL I was able to see the SSD has millions of fragments.  As my previous blogs for OS Error 665 have highlighted the NTFS storage attribute list is finite.  Control of the attribute list size depends on the version of Windows and format settings. 

I attempted to use Windows Defrag.exe.  It completed quickly but the number of fragments didn’t change significantly.  What I ended up doing was:

  1. Take the database offline
  2. Copy the MDF off the SSD
  3. Copy the MDF back to the SSD
  4. Bring database back online

This reduced the fragments by allowing the SSD firmware to detect the serial write activity.   I didn’t test a backup and restore sequence but I suspect a backup and restore with replace sequence would result in similar defragmentation like outcome.

Recommendation(s): Use an appropriate, battery backed controller designed to optimize write activities.  This can improve performance, reduce drive wear and physical fragmentation levels.

Consider REFS to avoid the NTFS attribute limitations.

Make sure the file growth sizes are appropriately sized.

Compression

I am still trying to understand if there are any real impacts from the SSD compression behaviors.   Some of the SSD documentation mentions that writes may be compressed by the SSD.  The compression occurring for the SSD is part of the write operation.  As long as the drive maintains the intent of stable media, compression could elongate the drive life and may positively impact performance.

Summary

  • Maintain proper backup and disaster recovery procedures and processes.
  • Keep your firmware up-to-date.
  • Listen closely to your hardware manufactures guidance.

In am learning something new about SSD deployments every day and plan to post updates when appropriate.

References Links

http://www.microsoft.com/en-us/sqlserver/solutions-technologies/mission-critical-operations/io-reliability-program.aspx

https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf
http://research.microsoft.com/pubs/63596/USENIX-08-SSD.pdf
http://www.hanselman.com/blog/TheRealAndCompleteStoryDoesWindowsDefragmentYourSSD.aspx
http://www.storagesearch.com/ssdmyths-endurance.html
http://www.anandtech.com/show/2738/8
http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2012/20120821_TC11_Hansen.pdf
http://en.wikipedia.org/wiki/Wear_leveling, http://searchsolidstatestorage.techtarget.com/definition/wear-leveling
http://www.sevenforums.com/tutorials/113967-ssd-alignment.html
http://www.thessdreview.com/Forums/ssd-beginners-guide-and-discussion/3630-samsung-840-250gb-pro-256gb-nand-page-block-size-info.html
http://www.kingston.com/us/ssd/overprovisioning
http://www.networkcomputing.com/storage/demystifying-ssd-wear-leveling/a/d-id/1097528
http://www.darkreading.com/database-security/collecting-the-ssd-garbage/d/d-id/1096882?
http://www.microsoft.com/en-us/sqlserver/solutions-technologies/mission-critical-operations/io-reliability-program.aspx
https://www.youtube.com/watch?v=s7JLXs5es7I 

Bob Dorr - Principal SQL Server Escalation Engineer

Troubleshooting Memory Issues with Reporting Services

0
0

We had a case where Reporting Services was crashing with the following error.

Failed allocate pages: FAIL_PAGE_ALLOCATION 2

The number at the end can be different.  In the customer’s case it was a 1.  In my local repro it was a 2.  This is basically an indication that we are out of memory for the process and the process crashes at that point as we can’t allocate any more memory.  You won’t see an OutOfMemory error, as the allocations come from SQL Operating System (SOS) which hosts the .NET CLR.  So, it is SOS that is trying to do the allocation that fails.  SOS is also used with the SQL Engine, so you will see this error from that side as well when you are out of memory.

Before we get into the meat of it, here is a link to an article that goes through the memory thresholds for Reporting Services and explains how it will handle memory when those thresholds are hit.

Configure Available Memory for Report Server Applications
https://msdn.microsoft.com/en-us/library/ms159206.aspx

When Reporting Services starts, it will calculate what the maximum amount of memory will be for the process.  This will be done in one of two ways.

First, if WorkingSetMaximum is not set in the rsreportserver.config (which it isn’t there by default), then Reporting Services will derive the maximum memory setting based on your total physical memory.  To see this happen, we can look at the Reporting Services Trace Log and look for Derived memory.

rshost!rshost!19c4!01/29/2015-05:03:22:: i INFO: Derived memory configuration based on physical memory as 33486264 KB

servicecontroller!DefaultDomain!b14!01/29/2015-05:03:22:: i INFO: Total Physical memory: 34289934336

You may or may not see something like the following entry if you configure WorkingSetMaximum.

library!DefaultDomain!3278!12/02/2014-16:11:18:: i INFO: Initializing WorkingSetMaximum to '12486264' kilobytes as specified in Configuration file.

We also have the concept of MemorySafetyMargin and MemoryThreshold.  These are used to alert Reporting Services and to start to back off as we need to allocate more memory and we are already pretty full.  These are also configured within the rsreportserver.config.  The default values are 80% and 90% respectively of whatever our Maximum value is set to.

<MemorySafetyMargin>80</MemorySafetyMargin>
<MemoryThreshold>90</MemoryThreshold>

We can also validate these values within the Reporting Services Trace Log.

library!DefaultDomain!19c4!01/29/2015-05:03:19:: i INFO: Initializing MemorySafetyMargin to '80' percent as specified in Configuration file.
library!DefaultDomain!19c4!01/29/2015-05:03:19:: i INFO: Initializing MemoryThreshold to '90' percent as specified in Configuration file.

All of this amounts to when Reporting Services will start triggering memory pressure notifications.  These notifications can either be Low, Medium or High.  The link above has a great image that shows you when each one will trigger.

Configuration settings for memory state

NOTE:  You will only see the NotifyMemoryPressure items in the log if you have the log set to Verbose.  Specifically the applicationdomain category.

You also have the opportunity to define WorkingSetMaximum and WorkingSetMinimum in the rsreportserver.config if you know that you have other items running on this machine and you don’t want Reporting Services to starve other items.  Or at the minimum, for Reporting Services and the other services to play nice with each other.  This allows us to cap Reporting Services manually instead of going on the derived value.

Out of Memory Condition

Let’s go back to the FAIL_PAGE_ALLOCATION error that I mentioned at the beginning.  If you receive this, we ran out of memory and couldn't recover fast enough and then fail to allocate because nothing is available.

Without verbose logging, we can see the following type of behavior in the Reporting Services Log.

processing!ReportServer_0-2!199c!01/29/2015-05:28:37:: w WARN: Processing Scalability -- Memory Shrink Request Received
processing!ReportServer_0-2!199c!01/29/2015-05:28:38:: w WARN: Processing Scalability -- Memory Shrink Request Received
 Failed allocate pages: FAIL_PAGE_ALLOCATION 1

Here we can see that there were Memory Shrink Requests.  This is an indication that we are starting to hit the ceiling and Reporting Services wants to back off to have more breathing room.  The allocation error caused the process to crash.  It would then restart on it’s own.  There is no other information that is logged.  Nothing in the Event Logs or from the Reporting Services Trace log.

Troubleshooting

rsreportserver.log

The first thing I tend to look at for this is what are the settings defined in the rsreportserver.config.  In the customer’s case, we see the following.

<MemorySafetyMargin>80</MemorySafetyMargin>
<MemoryThreshold>90</MemoryThreshold>
<WorkingSetMaximum>1000000</WorkingSetMaximum>
<WorkingSetMinimum>400000</WorkingSetMinimum>

This alone is a giant red flag.  WorkingSetMaximum looks really low.  I don’t necessarily care about WorkingSetMinimum.  And the defaults for MemorySafetyMargin and MemoryThreshold are fine.

This is the definition of WorkingSetMaximum from the MSDN article referenced at the top of this blog.

WorkingSetMaximum

Specifies a memory threshold after which no new memory allocations requests are granted to report server applications.

By default, the report server sets WorkingSetMaximum to the amount of available memory on the computer. This value is detected when the service starts.

This setting does not appear in the RSReportServer.config file unless you add it manually. If you want the report server to use less memory, you can modify the RSReportServer.config file and add the element and value. Valid values range from 0 to maximum integer. This value is expressed in kilobytes.

When the value for WorkingSetMaximum is reached, the report server does not accept new requests. Requests that are currently in progress are allowed to complete. New requests are accepted only when memory use falls below the value specified through WorkingSetMaximum.

If existing requests continue to consume additional memory after the WorkingSetMaximum value has been reached, all report server application domains will be recycled. For more information, see Application Domains for Report Server Applications.

If this value is specified in kilobytes (KB), then the WorkingSetMaximum defined above is only about 1GB of memory.  That definitely won’t be enough.  We won’t get very far with only 1GB of maximum memory.  From the Reporting Services Trace Log, we can see what the customer’s total physical memory was.

servicecontroller!DefaultDomain!9e80!01/26/2015-13:48:48:: i INFO: Total Physical memory: 137372422144<--137GB!!!

I’m guessing that the thought was that WorkingSetMaximum was defined in megabytes (MB) instead of kilobytes (KB).  So, if we really wanted 100GB for the WorkingSetMaximum, we would need to add two more 0’s.

Verbose Logging

Verbose Logging can also help you understand the situation a little better, especially if the out of memory condition doesn’t happen right away.  From a memory perspective, I only really care about the appdomainmanager category.  We can set that up for verbose by setting the following within the ReportingServicesService.exe.config file.

<system.diagnostics>
  <switches>
    <add name="DefaultTraceSwitch" value="4" />
  </switches>
</system.diagnostics>
<RStrace>
  <add name="FileName" value="ReportServerService_" />
  <add name="FileSizeLimitMb" value="32" />
  <add name="KeepFilesForDays" value="14" />
  <add name="Prefix" value="appdomain, tid, time" />
  <add name="TraceListeners" value="file" />
  <add name="TraceFileMode" value="unique" />
  <add name="Components" value="all:3;appdomainmanager:4" />
</RStrace>

Here is what the output will look like.  I included some INFO messages to show the flow of what happened.

library!DefaultDomain!33e4!01/29/2015-05:44:31:: i INFO: Initializing MemorySafetyMargin to '80' percent as specified in Configuration file.
library!DefaultDomain!33e4!01/29/2015-05:44:31:: i INFO: Initializing MemoryThreshold to '90' percent as specified in Configuration file.
library!DefaultDomain!33e4!01/29/2015-05:44:31:: i INFO: Initializing WorkingSetMaximum to '1000000' kilobytes as specified in Configuration file.
library!DefaultDomain!33e4!01/29/2015-05:44:31:: i INFO: Initializing WorkingSetMinimum to '400000' kilobytes as specified in Configuration file.

servicecontroller!DefaultDomain!3034!01/29/2015-05:44:35:: i INFO: Total Physical memory: 34289934336

library!ReportServer_0-2!17a0!01/29/2015-05:45:46:: i INFO: RenderForNewSession('/MemoryPressure/MemoryHogContoso')
library!ReportServer_0-2!31f8!01/29/2015-05:45:46:: i INFO: RenderForNewSession('/MemoryPressure/MemoryHogContoso') <-- This is the report request received before we started seeing the shrink requests.

appdomainmanager!DefaultDomain!18e4!01/29/2015-05:45:50:: v VERBOSE: Received NotifyMemoryPressure(pressureLevel=MediumPressure, kBytesToFree=33960)
appdomainmanager!WindowsService_0!18e4!01/29/2015-05:45:50:: v VERBOSE: Memory Statistics: 0 items, 0KB Audited, 0KB Freeable, 924384KB Private Bytes
appdomainmanager!WindowsService_0!18e4!01/29/2015-05:45:50:: v VERBOSE: Spent 3ms enumerating MAP items and 2ms dispatching notifications.

appdomainmanager!DefaultDomain!18e4!01/29/2015-05:45:51:: v VERBOSE: Appdomain (ReportServer) attempted to free 23016 KB.

appdomainmanager!DefaultDomain!18e4!01/29/2015-05:45:52:: v VERBOSE: Received NotifyMemoryPressure(pressureLevel=HighPressure, kBytesToFree=121216)

appdomainmanager!WindowsService_0!18e4!01/29/2015-05:45:52:: v VERBOSE: Skipping shrink request for appdomain (WindowsService_0) because no memory consuming requests are registered.
appdomainmanager!ReportServer_0-2!18e4!01/29/2015-05:45:52:: v VERBOSE: Skipping shrink request for appdomain (ReportServer_MSSQLSERVER_0-2-130670054877986461) because not enough time has passed since last shrink request.

Failed allocate pages: FAIL_PAGE_ALLOCATION 1

Assuming the issue was not due to a low value setting on the WorkingSetMaximum value, I would look to see what report executed before we started seeing the shrink requests and look at how that may be consuming a lot of memory.  Depending on your server, it may be a result of several report requests working together and we would need to see what we can do to stagger them or pull back the amount of data they are consuming. 

If it is due to the number of users hitting the Report Server, you may want to look at going into a scale-out configuration to spread the load. Also, if you are running subscriptions, you could look at offloading those to a separate server from on demand report requests.

Performance Monitor

Performance Monitor (perfmon) can be used to see the consumption as well.  Unfortunately, it won’t really do a lot to help you pinpoint the problem.  It will just help you confirm that you do have a problem.

SNAGHTML601db6

SNAGHTML60c783

The counters I used were the following.

Process : Private Bytes (ReportingServicesService)
Process : Virtual Bytes (ReportingServicesService)
Process : Working Set (ReportingServicesService)
Process : Working Set – Private (ReportingServicesService)
ReportServer : Service : Memory Pressure State
ReportServer : Service : Memory Shrink Amount
ReportServer : Service : Memory Shrink Notifications/sec

Unfortunately, the ReportServer : Service counters did not appear to pick anything up.

To wrap up, there isn’t a magic bullet when it comes to memory issues. We need to investigate what the Report Server is doing and what is running. It could be because of a Report, a 3rd Part Extension, custom code segments.  Try to narrow it down to a specific report and go from there.  Also, make sure Reporting Services is playing nice with other services on the machine, if there are any.  You can use WorkingSetMaximum to do that.

 

Adam W. Saxton | Microsoft Business Intelligence Server Escalation Services
Twitter    YouTube


Frequently used knobs to tune a busy SQL Server

0
0

In calendar year 2014, the SQL Server escalation team had the opportunity to work on several interesting and challenging customers issues. One trend we noticed is that many customers were migrating from old versions of SQL Server running on lean hardware to newer versions of SQL Server with powerful hardware configurations. Typical examples would look like this: SQL 2005 + Win 2003 on 16 cores + 128 GB RAM migrated to SQL 2012 + Win 2012 on 64 cores + 1 TB RAM. The application workload or patterns remained pretty much the same. These servers normally handle workloads that is multiple thousand batches per sec. Under these circumstances, the normal expectation is that the throughput and performance will increase in line with the increase in the capabilities of the hardware and software. That is usually the case. But there are some scenarios where you need to take some additional precautions or perform some configuration changes. These changes were done for specific user scenarios and workload patterns that encountered a specific bottleneck or a scalability challenge.

 

As we worked through these issues, we started to capture the common configuration changes or updates that were required on these newer hardware machines. The difference in throughput and performance is very noticeable on these systems when these configuration changes were implemented. The changes include the following:

- SQL Server product updates [Cumulative Updates for SQL Server 2012 and SQL Server 2014]

- Trace flags to enable certain scalability updates

- Configuration options in SQL Server related to scalability and concurrency

- Configuration options in Windows related to scalability and concurrency

 

All these recommendations are now available in the knowledge base article 2964518:

Recommended updates and configuration options for SQL Server 2012 and SQL Server 2014 used with high-performance workloads

As we continue to find new updates or tuning options that are used widely we will add them to this article. Note that these recommendations are primarily applicable for SQL Server 2012 and SQL Server 2014. Few of these options are available in previous versions and you can utilize them when applicable.

If you are bringing new servers online or migrating existing workloads to upgraded hardware and software, please consider all these updates and configuration options. They can save a lot of troubleshooting time and provide you with a smooth transition to powerful and faster systems. Our team is using this as a checklist while troubleshooting to make sure that SQL Servers running on newer hardware is using the appropriate and recommended configuration.

Several members of my team and the SQL Server product group contributed to various efforts related to these recommendations and product updates. We also worked with members of our SQL Server MVP group [thank you Aaron Bertrand and Glenn Berry] to ensure these recommendations are widely applicable and acceptable for performance tuning.

We hope that you will implement these updates and configuration changes in your SQL Server environment and realize good performance and scalability gains.

 

Suresh B. Kandoth

SQL Server Escalation Team

Microsoft SQL Server

Running SQL Server on Machines with More Than 8 CPUs per NUMA Node May Need Trace Flag 8048

0
0

Applies To:  SQL 2008, 2008 R2, 2012 and 2014 releases

Note:  The number of CPUs is the logical count, not sockets.   If more than 8 logical CPUs are presented this post may apply.

The SQL Server developer can elect to partition memory allocations at different levels based on the what the memory is used for.   The developer may choose a global, CPU, Node, or even worker partitioning scheme.   Several of the allocation activities within SQL Server use the CMemPartitioned allocator.  This partitions the memory by CPU or NUMA node to increase concurrency and performance.  

You can picture CMemPartitioned like a standard heap (it is not a HeapCreate) but this concept is the same.  When you create a heap you can specify if you want synchronized assess, default size and other attributes.   When the SQL Server developer creates a memory object they indicate that they want things like thread safe access, the partitioning scheme and other options.

The developer creates the object so when a new allocation occurs the behavior is upheld.  On the left is a request from a worker against a NODE based memory object.  This will use a synchronization object (usually CMEMTHREAD or SOS_SUSPEND_QUEUE type) at the NODE level to allocate memory local to the workers assigned NUMA NODE.   On the right is an allocation against a CPU based memory object.  This will use a synchronization object at the CPU level to allocate memory local to the workers CPU.

In most cases the CPU based design reduces synchronization collisions the most because of the way SQL OS handles logical scheduling.  Preemptive and background tasks make collisions possible but CPU level reduces the frequency greatly.  However, going to CPU based partitioning means more overhead to maintain individual CPU access paths and associated memory lists.  

The NODE based scheme reduces the overhead to the # of nodes but can slightly increase the collision possibilities and may impact ultimate, performance results for very specific scenarios.  I want to caution you the scenarios encountered by Microsoft CSS have been limited to very specific scopes and query patterns.

image

 

Newer hardware with multi-core CPUs can present more than 8 CPUs within a single NUMA node.  Microsoft has observed that when you approach and exceed 8 CPUs per node the NODE based partitioning may not scale as well for specific query patterns.   However, using trace flag 8048 (startup parameter only requiring restart of the SQL Server process) all NODE based partitioning is upgraded to CPU based partitioning.   Remember this requires more memory overhead but can provide performance increases on these systems.

HOW DO I KNOW IF I NEED THE TRACE FLAG?

The issue is commonly identified by looking as the DMVs dm_os_wait_stats and dm_os_spinlock_stats for types (CMEMTHREAD and SOS_SUSPEND_QUEUE).   Microsoft CSS usually sees the spins jump into the trillions and the waits become a hot spot.   

Caution: Use trace flag 8048 as a startup parameter.   It is possible to use the trace flag dynamically but limited to only memory objects that are yet to be created when the trace flag is enabled.  Memory objects already built are not impacted by the trace flag.

References

http://blogs.msdn.com/b/psssql/archive/2012/12/20/how-it-works-cmemthread-and-debugging-them.aspx

 

 

Bob Dorr - Principal SQL Server Escalation Engineer

How It Works: MAX DOP Level and Parallel Index Builds

0
0

I have been working on an issue where rebuilding an index leads to additional fragmentation.   Using XEvents I debugged the page allocations and writes and was able to narrow in on the behavior.

There are lots of factors to take into account when rebuilding the index.   I was able to break down the behavior to the worst possible case using a single file database, single heap table,  SORT IN TEMPDB and packing of the heap data to the beginning of the database file when create clustered index is issued.

When the index is build a portion of the data (range) is assigned to each of the parallel workers.  The diagram below shows a MAX DOP = 2 scenario.

clip_image002Each parallel worker is assigned its own CBulkAllocator when saving the final index pages.   This means Worker 1 gets an extent and starts to fill pages from TEMPDB for Worker 1’s given key range.   Worker 2 is executing in parallel and has its own CBulkAllocator.  Worker 2 acquires the next extent and starts to spool the assigned key range.

Looking at the database a leap frog behavior of values, across extents occurs as the workers copy the final keys into place.

The diagram below shows the leap frog behavior from a MAX DOP = 4 index creation.   The saw tooth line represents the offsets in the file as read during an index order scan.  The horizontal access is the event sequence and the vertical access is the offset in the database file.  As you can see the leap frog behavior places key values all over the file.

Key 1 is at a low offset but Key 2 is at an offset higher than Key 9 as shown in the example above.  Each of the workers spreads 1/4th of the data across the entire file instead of packing the key values together in a specific segment of the file.

clip_image002

In comparison the a serial index build shows the desired layout across the drive.   Smaller offsets have the 1st set of keys and larger offsets always have higher key values.

image

This mattered to my customer because after a parallel index build an index ordered scan takes longer than a serial index build.  The chart below shows the difference in read size and IOPS requirements.

select count_big(*) from tblTest (NOLOCK)

Serial Built

Parallel Built

Avg Read Size

508K

160K

Duration

00:01:20

00:01:50

# Reads

15,000

52,000

SQL Server reads up to 512K in a chuck for read ahead behavior.   When doing an index order scan we read the necessary extents to cover the key range.  Since the key range is leap frogged, during the parallel build, the fragmentation limits SQL Server’s I/O size to 160K instead of 508K and drives the number of I/O requests much higher.  The same data in a serial built index maximizes the read ahead capabilities of SQL Server.

The testing above was conducted using:  select count_big(*) from tblTest with (NOLOCK)

Hint: You don’t have to rebuild the index in serial to determine how much a performance gain it may provide.   Using WITH(NOLOCK, INDEX=0) forces an allocation order scan, ignoring the key placement and scanning the object from first IAM to last IAM order.  Leveraging the statistics I/O, XEvents and virtual file statistics output you are able to determine the behaviors.

Workarounds
The obvious question is that a serial index rebuild can take a long time so what should I do to leverage parallel index builds and reduce the fragmentation possibilities?

1. Partition the table on separate files matching the DOP you are using to build the index.  This allows better alignment of parallel workers to specific partitions, avoiding the leap frog behavior.

2. For a non-partitioned table aligning the number of files with the DOP may be helpful.   With reasonably even distribution of free space in each file the allocation behavior is such that alike keys will be placed near each other.

3. For single partition rebuild operations consider serial index building behaviors to minimize fragmentation behaviors.

Future
I am working with the development team to evaluate the CBulkAllocator behavior.   Testing is needed but it could be that the CBulkAllocator attempts to acquire 9 (64K) extents to align with the read ahead (512K) chunk size.   Something like this idea could reduce the fragmentation by a factor of 8.

Bob Dorr - Principal SQL Server Escalation Engineer

Does rebuild index update statistics?

0
0

I recently did a talk to a group of SQL users.  Quite a few facts ended up surprising the audience.   I thought I’d share a few and wanted to start with index rebuild.

If someone asks you the question “Does rebuild index update statistics?”,  you probably will say “of course”.  You may be surprised to know that index rebuild doesn’t update all statistics. 

when you use alter index rebuild, only statistics associated with that index will be updated.   In order to illustrate this better, let’s draw a table

Index Statsnon-index stats
ALTER INDEX REORGNONO
ALTER INDEX <index_name>  REBUILDyes but only for stats associated with  that indexNO
ALTER INDEX ALL  REBUILDyes, stats for all indexes will be updatedNO
DBREINDEX (old syntax)YESYES

Note that non-index stats means the statistics associated with a column/columns that are automatically created or manually created.

As you can see from above, don’t assume all of your statistics get updated just because you have a maintenance plan to rebuild index.   Sometimes, non-index statistics are very critical as well.  Manual updating statistics may be necessary because our current trigger threshold is high for large tables (20% in most cases as in KB “Statistical maintenance functionality (autostats) in SQL Server” ) though trace flag 2370 can help (in blog).

 

Demo

Here is a demo that alter index doesn’t update all stats.

first use the following script to setup

if object_id ('t') is not null
    drop table t
go
create table t(c1 int, c2 as c1 & 1)
go

create index t1_indx1 on t(c1 )
go
set nocount on
declare @i int
set @i = 0
while @i < 1000
begin
insert into t (c1) values (@i)
set @i = @i + 1
end
go

update statistics t with fullscan
go

go
--this will create a stats on c2
select count(*) from t where c2 =1

go

Because  I ran update statistics, the follow query will show that the t1_indx1 and _WA_Sys_00000002_162F4418 have the same value for laste_updated

SELECT
    obj.name, stat.name, stat.stats_id, last_updated
FROM sys.objects AS obj
JOIN sys.stats stat ON stat.object_id = obj.object_id
CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp
where obj.name = 't'

image

 

Now, I ran alter index all rebuild.

-- alter all indexes
alter index all on t rebuild
--re-organize won't update even stats of the index
--alter index all on t reorganize

 

Then I ran the following query.  Note that  the last_updated for t1_indx1 has newer time stamp than _WA_Sys_00000002_162F4418 because _WA_Sys_00000002_162F4418 never got updated by alter index command.

--run theh following and note that the stats created by auto stats didn't get updated by rebuild
--only the stats from the index got updated

SELECT
    obj.name, stat.name, stat.stats_id, last_updated
FROM sys.objects AS obj
JOIN sys.stats stat ON stat.object_id = obj.object_id
CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp
where obj.name = 't'

image

 

Past blog related to statistics: Case of using filtered statistics

 

Jack Li | Senior Escalation Engineer | Microsoft SQL Server Support

Moving Reporting Services off of a Cluster

0
0

We had a customer that had deployed Reporting Services to their Cluster. They now wanted to move the RS Instance off of the Cluster and onto its own machine and leave the Catalog Database on the Clustered SQL Server.

We have a blog talks about Reporting Services and clusters. You can find that at the following link.

Reporting Services, Scale Out and Clusters…
http://blogs.msdn.com/b/psssql/archive/2010/05/27/reporting-services-scale-out-and-clusters.aspx

This focuses more on why you shouldn’t do it and doesn’t address how to get out of the situation if you are in it. So, I wanted to just outline what we did for this customer and it may help others who get into the same situation.

Our goal is to not have RS running on either physical node of the Cluster and instead have RS running on a separate machine outside of the cluster. We want RS to be running on its own server.

NOTE: This is for Native Mode Reporting Services.

Let’s go through the steps to get this migration accomplished.

Backup the Encryption Key

The first thing we need to do is backup the Encryption key for the current instance that is running. We can do this by going to the Reporting Services Configuration Manager and going to the Encryption Keys section.

clip_image001

clip_image003

The Backup button should be enabled if you haven’t already backed up the key.

Make sure you have the Virtual Network Name (VNN) of your SQL Cluster

If you don’t know the VNN of your SQL Cluster, you can go to the Failover Cluster Manager to get this. Make sure you are looking at the SQL Cluster and not the Windows Cluster. We will need this name when we point the other machine to the Database holding the catalog database.

Assuming that the current RS Instance on the cluster is using that cluster for the catalog database, you can also get it from the Reporting Services Configuration Manager in the Database Section.

clip_image004

Stop Reporting Services

Make sure that the Reporting Services Service is stopped on both Cluster Nodes. You will also want to change the service to be disabled so it doesn’t start back up. To disable the service, you can do that within the SQL Server Configuration Manager.

Go to the properties of the Reporting Services Service. On the Service Tab, change the Start Mode to Disabled.

clip_image006

Install Reporting Services

Go ahead and install Reporting Services on the server you want it to run on. Depending on what you are going to do on that server, you should only need to choose the Native Mode RS Feature and nothing else.

Configure the new Reporting Services Instance

After the instance is installed on the new machine, start the Reporting Services Configuration Manager.

The setup will be the normal configuration steps you would do for configuring Reporting Services with the following exceptions.

Database

Make sure we are pointing to the Virtual Network Name of the SQL Cluster for the Database Server. Also make sure we select the Catalog Database that the other server was using. We want to use the same one to make sure we don’t lose any data. The default name will be ReportServer.

Scale-Out Deployment

After the database is configured, you can go to the Scale-Out Deployment section. If you see the Cluster Nodes listed here, you will want to remove them. As we only want this new server to be used.

clip_image008

Encryption Keys

We will now want to restore the Encryption Key that we already backed up. Go to the Encryption Keys tab and click on Restore.

clip_image010

That’s it! It should be up and running now on the new server and you should be able to browse to Report Manager and see your reports and they should render.

References

Host a Report Server Database in a SQL Server Failover Cluster
https://msdn.microsoft.com/en-us/library/bb630402.aspx

Configure a Native Mode Report Server Scale-Out Deployment
https://msdn.microsoft.com/en-us/library/ms159114.aspx

 

Robyn Ciliax
Microsoft Business Intelligence Support

Viewing all 339 articles
Browse latest View live




Latest Images