How It Works: FileStream (RsFx) Garbage Collection–Part (2)

April 4, 2014, 8:45 am

≫ Next: SQL Server 2014: TEMPDB Hidden Performance Gem

≪ Previous: I think I am getting duplicate query plan entries in SQL Server’s procedure cache

In a previous post I outlined the basics of File Stream, garbage collection: http://blogs.msdn.com/b/psssql/archive/2011/06/23/how-it-works-filestream-rsfx-garbage-collection.aspx

This post continues the discussion, outlining specific details as to how the garbage collection progresses.

A Single GC Thread

The instance of SQL Server contains a single, background worker performing the garbage collection activities. The FSGC worker, wakes up every ~5 seconds and loops over each database, managing the tombstone table entries in small batches.

foreach database
Do File Stream Garbage Collection (batch size)

Fixed Batch Size

The batch size for the background, file stream, garbage collection (FSGC) worker is currently 20 rows (a hard coded value.) This allows FSGC to remain a background and unobtrusive process.

Note: The type of tombstone row and status determines the action FSGC takes. For example a delete of a row enters a single delete entry into the tombstone table. A truncate table may enter a single row in the tombstone table and FSGC understands the single file delete vs truncate status and may take broader action.

sp_filestream_force_garbage_collection (http://technet.microsoft.com/en-us/library/gg492195.aspx)

If you are like me the first thing I did was the math on the 20 rows per batch every 5 seconds and determined there is a finite number of rows in a 24 hour period the FSGC can accomplish. Enter sp_filestram_force_garbage_collection in SQL Server 2012 as a new feature.

The procedure allows you to execute the FSGC as a foreground process without the batching limits. This can be helpful if you have a large number of changes (inserts/updates/deletes) to file stream data where the tombstone rows and associated disk files have grown to a large number you wish to aggressively cleanup.

Delete and Status

When you do an update to a file stream column or perform a delete of the row an entry for the delete is placed in the tombstone table.

For example:

update tblDocuments set FSData = ‘New Image’ where PK = 1

Checkpoint and Log Backup: The ‘Orig Image’ can’t be deleted until properly synchronized with the backup stream activities. The first log backup after the update secures the Orig and New Image. This allows point-in-time restore capabilities, just before the update during restore.

If you select from the tombstone table before FSGC and proper checkpoint/log backup has been executed the status column for the entry will contain a (####7) such as 17 or 217.

Prepare for delete (7): The status column is a combination of bits used by FSGC to determine the state of the row. A value of (7) in the lower position is an indication of prepare for delete to the FSGC.

Do physical file deletion (8): FSGC must transition the entry from the status of 17 or 273 to 18 or 283, requiring a second checkpoint and log backup take place before the physical file can be removed from disk storage.

Confused yet? I was so let me walk you through the activity (checkpoints, log backups and status changes.)

Update – Enters new entry with status #7 into tombstone
FSGC runs – can’t do anything no log backup and checkpoint has advanced the checkups and backup LSNs (you can see with dbcc dbtable). Entries stay at #7 status.
Checkpoint/Log backup executes
FSGC runs – finds first batch (20) rows by lowest LSN values and attempts appropriate actions. Finds entry of status #7 with proper checkpoint/log backup of status #7 and updates status to #8 so next FSGC after another checkpoint/log backup can do physical removal of file. (*See below for update details.)
FSGC runs – can’t do anything no log backup and checkpoint has advanced the checkups and backup LSNs (you can see with dbcc dbtable). Entries stay at #8 status.
Checkpoint/Log backup executes
FSGC runs – finds first batch (20) rows by lowest LSN values and attempts appropriate actions. Finds entry of status #8 with proper checkpoint/log backup of status #8; removes the physical file and deletes tombstone entry.

* The update of #7 to #8 status is not an in-place update. The update is a delete (old tombstone row) and insert (updated) row pairing. This matters to you because of the following.

FSGC is looking at the top 20 rows during each batch. When it transitions the 1st row (shown above) from #7 to #8 status the row is added to the end of the tombstone table because it is the newest LSN.

In this example if you delete 10,000 rows from the base table you get 10,000 entries in the tombstone table. The physical, file deletion won’t occur until the FSGC has first transitioned all 10,000 entries from #7 to #8 status. Since FSGC runs every ~5 seconds in batches of 20 the math is really 10,000 * 2 (because you have to go through #7 to #8 status changes and then divide by 20 * 5 or (20000/20) = 1000 FSGC batches @ 5 seconds per batch = 5000 seconds or ~83 minutes (assuming proper checkpoints and log backups) to physically remove the original files.

In Summary

Understanding the transitions states, batch sizes and requirements for log backups and checkpoints should allow you to maintain your databases with file stream data better.

Make sure you have regular log backups
If database is inactive you may want to run manual checkpoint(s)
Consider sp_filestream_force_garbage_collection on SQL Server 2012 and newer versions

Bob Dorr - Principal SQL Server Escalation Engineer

↧

SQL Server 2014: TEMPDB Hidden Performance Gem

April 9, 2014, 7:06 am

≫ Next: RS, SharePoint and Forefront UAG Series – Part 4 (Power View - Export to PowerPoint)

≪ Previous: How It Works: FileStream (RsFx) Garbage Collection–Part (2)

I ran across a change for TEMPDB BULK OPERATIONS (Select into, table valued parameters (TVP), create index with SORT IN TEMPDB, …) that you will benefit from.

For example, I have a Create Index … WITH SORT IN TEMPDB that takes ~1 minute in SQL Server 2012. On the same machine using a SQL Server 2014 instance, the index builds in 19 seconds.

SQL Server has had a concept of eager writes for many versions. The idea is to prevent flooding the buffer pool with pages that are newly created, from bulk activities, and need to be written to disk (write activities.) Eager writes help reduce the pressure on lazy writer and checkpoint as well as widening the I/O activity window, allowing for better performance and parallel usage of the hardware.

The design is such that bulk operations may track the last ## of pages dirtied, in a circular list. When the list becomes full old entries are removed to make room for new entries. During the removal process the older pages are put in motion to disk, if still dirty – API: WriteFileGather. The intent is to gather up to 128K, contiguous dirty pages (32) and write them out.

The change in SQL Server 2014 is to relax the need to flush these pages, as quickly, to the TEMPDB data files. When doing a select into … #tmp … or create index WITH SORT IN TEMPDB the SQL Server now recognizes this may be a short lived operation. The pages associated with such an operation may be created, loaded, queried and released in a very small window of time.

For example: You could have a stored procedure that runs in 8ms. In that stored procedure you select into … #tmp … then use the #tmp and drop it as the stored procedure completes.

Prior to the SQL Server 2014 change the select into may have written all the pages accumulated to disk. The SQL Server 2014, eager write behavior, no longer forces these pages to disk as quickly as previous versions. This behavior allows the pages to be stored in RAM (buffer pool), queried and the table dropped (removed from buffer pool and returned to free list) without ever going to disk as long memory is available. By avoiding the physical I/O when possible the performance of the TEMPDB, bulk operation is significantly increased and it reduces the impact on the I/O path resources as well.

The pages used in these operations are marked so lazy writer will favor writing them to TEMPDB are returning the memory to the free list before impacting pages from user databases, allowing SQL Server to handle some of your TEMPDB operations with increased performance.

In progress, no promises: We are actively investigating a port of this change to SQL Server 2012 PCU2 so your SQL Server 2012 installations can take advantage of the performance increase as well.

Bob Dorr - Principal SQL Server Escalation Engineer

↧

RS, SharePoint and Forefront UAG Series – Part 4 (Power View - Export to PowerPoint)

April 23, 2014, 12:45 pm

≫ Next: RS, SharePoint and Forefront UAG Series – Part 3 (Power Pivot Gallery - Silverlight)

≪ Previous: SQL Server 2014: TEMPDB Hidden Performance Gem

Part 1 – Intro
Part 2 – Operational Reports (Classic RDL Reports)
Part 3 – Power Pivot Gallery (Silverlight)
Part 4 – Export a Power View Report to PowerPoint (you are here)

The last part of this engagement was when they tried to Export a Power View report to PowerPoint, we hit the save button, and received an error. Under the hoods, we were getting an HTTP 500 error.

Looking in the UAG Trace Logs, we found something like the following:

https://sptest.uaglab.com/_vti_bin/reportserver/?rs:ProgressiveSessionId=be2741565c6d4fd2ac46643113730a81huf1fo3tnerk0f555ji2hynb&rs:Command=LogClientTraceEvents
Info:Detected HRS attack!!!PostDataLen=100933, ContentType=application/progressive-report, Dump: XMLSchema"><ClientDateTime>201-04-23T10:59:07+05:30</ClientDateTime><Category>datastructuremanager</Category><ProgressiveSessi. (ExtECB=0000000005D68040), (PFC=0000000002A72808)

By default, UAG is configured to protect the web servers from smuggling attacks. This is viewed when you click on Edit for the Application and going to Web Server Security. This is listed under Maximum size of POST request.

By default, UAG allows a maximum of 49152 bytes for the POST request as per this setting for the shown content types. And what we noticed in the falling request in the UAG traces is that the Post Data Length on this request was 100933 (was higher than the default limit on UAG). The content type, in the case, was application/progressive-report. So, to overcome the issue, we added this content type on the UAG Server smuggling protection settings and increased the maximum size of POST request to a value that would accommodate the request. For testing in our environment, we just changed it to 491520 as it would have covered the Post Data Length of 100933. You would need to do more analysis to see what would fit your needs without over exposing your deployment from an attack perspective, but still allow your site to function.

This allowed the Export to succeed.

We then ran into an issue where when loading up the PowerPoint document, we would see the static image and not the Interact button. It is not able to communicate the proper responses when going through the UAG/ADFS. We also do not get prompted for credentials. If we look at the properties of the item on the sheet, we can see that this is the Silverlight ActiveX control and it is set to go to the XAP.

Web Application Proxy (WAP)

Exporting to PowerPoint itself did not have any issues, or additional configuration, when using WAP. Unfortunately, when trying to run the PowerPoint document, we still did not have the Interact button going through WAP. This is because going through WAP it was as if it was pure Forms Auth. Even looking at a Fiddler, the response back from the server was the form to get the login credentials. The Silverlight control doesn’t prompt for the credentials, and no really way to do that when it is wanting a web form.

Takeaway

The takeaway on this one is that we can get it to successfully export the report to a PowerPoint document, however, within the document itself, you will only have the static image of the report and not the interactive aspects of the report.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

RS, SharePoint and Forefront UAG Series – Part 3 (Power Pivot Gallery - Silverlight)

April 23, 2014, 12:46 pm

≫ Next: RS, SharePoint and Forefront UAG Series – Part 2 (Operational Reports)

≪ Previous: RS, SharePoint and Forefront UAG Series – Part 4 (Power View - Export to PowerPoint)

Part 1 – Intro
Part 2 – Operational Reports (Classic RDL Reports)
Part 3 – Power Pivot Gallery (Silverlight) (you are here)
Part 4 – Export a Power View Report to PowerPoint

The last post really went into the meat of the issue that we had. After that, we found that when trying to go to the Power Pivot Gallery, it showed a blank white screen instead of the actual item list. Basically the Silverlight control wasn’t loading. The issue here was that the default filters within UAG were not allowing the XAP file (Silverlight) to pass through. The incoming request was effectively being rejected.

This was corrected by creating a URL Set within the Forefront Unified Access Gateway Management. When you go to your connect, you can click on Configure under Trunk Configuration.

Once in the Advanced Trunk Configuration, for your connection, you can go to the URL Set Tab. There you can click on “Add Primary” and add the following:

Give it a name
Change Action to Accept
Provide the expression for the URL that you want to accept. For ours, we added the following:

(/[^"#&*+:<>?\\{|}~]*)*/forms/~/_layouts/15/powerpivot/microsoft.analysisservices.spaddin.reportgallery.xap
We had se the Allowed Methods to the following:

PROPPATCH,MOVE,PROPFIND,PUT,COPY,HEAD,POST,MKCOL,DELETE,GET

After this was done, the Power Pivot Gallery came up with no issues. You may be asking what about the Power View Silverlight control. It wasn’t affected by UAG and worked as expected. A specific rule for the Power View XAP was not needed for our deployment. However, if you encounter an issue with Power View, you can follow a similar approach.

Web Application Proxy (WAP)

This was mentioned in the Intro and in the 2nd post. With the WAP deployment, no additional configuration changes were needed, and the Power Pivot Gallery just worked. It is again a much cleaner option to use.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

RS, SharePoint and Forefront UAG Series – Part 2 (Operational Reports)

April 23, 2014, 12:47 pm

≫ Next: RS, SharePoint and Forefront UAG Series – Intro

≪ Previous: RS, SharePoint and Forefront UAG Series – Part 3 (Power Pivot Gallery - Silverlight)

Part 1 – Intro
Part 2 – Operational Reports (Classic RDL Reports) (you are here)
Part 3 – Power Pivot Gallery (Silverlight)
Part 4 – Export a Power View Report to PowerPoint

This piece took the longest amount of time to narrow down what was going on. This issue was when they were trying to Render a report that was integrated within SharePoint 2013, being accessed by an external client going through a Forefront UAG. The result was that the report would get into this loop. It would almost look like a flicker.

From a fiddler trace, the patter we saw was the following just repeat itself:

#    Result    Protocol    Host    URL    Body    Caching    Content-Type    Process    Comments    Custom
368    200    HTTPS    sptest.uaglab.com    /_layouts/15/ReportServer/RSViewerPage.aspx?rv:RelativeReportUrl=/Reports/Company%20Sales%20SQL2008R2.rdl&Source=https%3A%2F%2Fsptest%2Euaglab%2Ecom%2FReports%2FForms%2FAllItems%2Easpx    96,899    private    text/html; charset=utf-8    iexplore:3980
369    200    HTTPS    sptest.uaglab.com    /InternalSite/logoffParams.asp?site_name=sptest&secure=1    1,415    private,no-cache    text/javascript    iexplore:3980
370    304    HTTPS    sptest.uaglab.com    /InternalSite/scripts/applicationScripts/whlsp15.js    0    no-cache        iexplore:3980
371    200    HTTPS    sptest.uaglab.com    /InternalSite/sharepoint.asp?site_name=sptest&secure=1    3,961    private,no-cache    text/javascript    iexplore:3980
372    200    HTTPS    sptest.uaglab.com    /InternalSite/?WhlST    30    no-cache        iexplore:3980
373    200    HTTPS    sptest.uaglab.com    /InternalSite/?WhlSL    30    no-cache        iexplore:3980
374    200    HTTPS    sptest.uaglab.com    /Reserved.ReportViewerWebPart.axd?OpType=SessionKeepAlive&ControlID=b19d27e5e8254cb69789caaa773937a7    122    private    text/plain; charset=utf-8    iexplore:3980 <-- POST via AJAX call - x-requested-with: XMLHttpRequest
375    200    HTTPS    sptest.uaglab.com    /_layouts/15/ReportServer/RSViewerPage.aspx?rv:RelativeReportUrl=/Reports/Company%20Sales%20SQL2008R2.rdl&Source=https%3A%2F%2Fsptest%2Euaglab%2Ecom%2FReports%2FForms%2FAllItems%2Easpx    514    no-cache; Expires: -1    text/plain; charset=utf-8    iexplore:3980
376    302    HTTPS    sptest.uaglab.com    /_login/default.aspx?ReturnUrl=%2f_layouts%2f15%2fReportServer%2fRSViewerPage.aspx%3frv%3aRelativeReportUrl%3d%2fReports%2fCompany%2520Sales%2520SQL2008R2.rdl%26Source%3dhttps%253A%252F%252Fsptest%252Euaglab%252Ecom%252FReports%252FForms%252FAllItems%252Easpx&rv:RelativeReportUrl=/Reports/Company%20Sales%20SQL2008R2.rdl&Source=https%3A%2F%2Fsptest%2Euaglab%2Ecom%2FReports%2FForms%2FAllItems%2Easpx    583    private, no-store    text/html; charset=utf-8    iexplore:3980
377    302    HTTPS    sptest.uaglab.com    /_windows/default.aspx?ReturnUrl=%2f_layouts%2f15%2fReportServer%2fRSViewerPage.aspx%3frv:RelativeReportUrl%3d%2fReports%2fCompany%2520Sales%2520SQL2008R2.rdl%26Source%3dhttps%253A%252F%252Fsptest%252Euaglab%252Ecom%252FReports%252FForms%252FAllItems%252Easpx&rv:RelativeReportUrl=/Reports/Company%20Sales%20SQL2008R2.rdl&Source=https:%2F%2Fsptest.uaglab.com%2FReports%2FForms%2FAllItems.aspx    617    private    text/html; charset=utf-8    iexplore:3980
378    200    HTTPS    sptest.uaglab.com    /_layouts/15/ReportServer/RSViewerPage.aspx?rv:RelativeReportUrl=/Reports/Company%20Sales%20SQL2008R2.rdl&Source=https%3A%2F%2Fsptest%2Euaglab%2Ecom%2FReports%2FForms%2FAllItems%2Easpx    96,899    private    text/html; charset=utf-8    iexplore:3980

What was apparently happening is that every POST request needs to be authenticated in the UAG setting. Not new POST contains the necessary credential. As a result, SharePoint issues a 401 in response to each POST. These are handled by UAG, which does the challenge/response handshake and then sends the final response back to the client. However, for some POST requests (like the ones sent from Reporting Services), the 401 gets modified before sent to UAG. According to this thread, the forms authentication module intercepts any 401 and replaces them with redirects.

With a SharePoint Claims configuration, you will have both Forms Authentication and Windows Authentication enabled.

The forms authentication module intercepts any 401s and replaces them with redirects. Since we have Forms and Windows authentication enabled, which according to IIS Manager is not supported, we get this behavior for what appeared to be only the AJAX requests coming from the Report Viewer Control.

There were two workarounds we came up with to avoid this looping behavior and to get reports to work.

Workaround 1: Response.SuppressFormsAuthenticationRedirect

Note: Following this workaround will put SharePoint into an unsupported configuration. Please use at your own risk as this has not been tested with other functionality within SharePoint. If you encounter an issue and call support, you may be asked to remove this snippet to continue. Also, installing updates to SharePoint may remove this snippet.

While Reporting Services 2012 uses the .NET Framework 2.0/3.5, SharePoint 2013 uses the 4.5 framework. There was a property introduced in the 4.5 framework, on the Response object, to suppress those Forms Auth Redirects (302). This is the Response.SuppressFormsAuthenticationRedirect property. This article talks about some of the challenges with light weight services and using jQuery. We added the following snippet to the global.asax. After doing so, the reports loaded fine.

<script runat="server">

protected void Application_BeginRequest() {
if(   FormsAuthentication.IsEnabled
     && Context.Request.RequestType == "POST"
     && Context.Request.Headers["x-requested-with"] == "XMLHttpRequest"
    )
Context.Response.SuppressFormsAuthenticationRedirect = true; }

</script>

The default path to the global.asax in our SharePoint deployment was: C:\inetpub\wwwroot\wss\VirtualDirectories\sptest.uaglab.com5196\. Reports were able to render properly at this point.

From fiddler, it looked as we expected it to, without the authentication loop.

#    Result    Protocol    Host    URL    Body    Caching    Content-Type    Process    Comments    Custom
13    200    HTTPS    sptest.uaglab.com    /_layouts/15/ReportServer/styles/1033/sqlrvdefault.css    3,362    max-age=31536000    text/css    iexplore:4916
14    304    HTTPS    sptest.uaglab.com    /InternalSite/scripts/applicationScripts/whlsp15.js    0    no-cache        iexplore:4916
15    200    HTTPS    sptest.uaglab.com    /InternalSite/sharepoint.asp?site_name=sptest&secure=1    3,961    private,no-cache    text/javascript    iexplore:4916
16    200    HTTPS    sptest.uaglab.com    /InternalSite/?WhlST    30    no-cache        iexplore:4916
17    200    HTTPS    sptest.uaglab.com    /InternalSite/?WhlSL    30    no-cache        iexplore:4916
18    200    HTTPS    sptest.uaglab.com    /Reserved.ReportViewerWebPart.axd?OpType=Resource&Version=11.0.3401.0&Name=ViewerScript    161,670    public; Expires: Fri, 23 May 2014 13:00:56 GMT    application/javascript    iexplore:4916
51    200    HTTPS    sptest.uaglab.com    /Reserved.ReportViewerWebPart.axd?OpType=SessionKeepAlive&ControlID=bee8fb2bf93e4e3bb3fd52acfcc3b7e7    122    private    text/plain; charset=utf-8    iexplore:4916
52    200    HTTPS    sptest.uaglab.com    /_layouts/15/ReportServer/RSViewerPage.aspx?rv:RelativeReportUrl=/Reports/Company%20Sales%20SQL2008R2.rdl&Source=https%3A%2F%2Fsptest%2Euaglab%2Ecom%2FReports%2FForms%2FAllItems%2Easpx    82,166    private    text/plain; charset=utf-8    iexplore:4916

Workaround 2: Web Application Proxy

This was mentioned in the Intro post, but I’ll mention it here as well. When setting up the Web Application Proxy, via Windows 2012 R2, we did not encounter any issues with regards to this problem. Reports rendered fine out of the box. No configuration changes were necessary. The win here is that this configuration is fully supported for both the Proxy perspective, SharePoint and Reporting Services. This is definitely the cleaner way to go, with less hassle. This also allowed Power View reports to just work, which I’ll talk about in the next post. I’ll post the information on Web Application Proxy here again for reference.

Web Application Proxy (WAP) Information:

Working with Web Application Proxy
http://technet.microsoft.com/en-us/library/dn584107.aspx

Installing and Configuring Web Application Proxy for Publishing Internal Applications
http://technet.microsoft.com/en-us/library/dn383650.aspx

Plan to Publish Applications through Web Application Proxy
http://technet.microsoft.com/en-us/library/dn383660.aspx

Step 3: Plan to Publish Applications using AD FS Pre-authentication
http://technet.microsoft.com/en-us/library/dn383641.aspx#BKMK_3_2

These TechNet articles include links to a complete walk-through guide to deploy a lab or POC environment with AD FS 2012 R2 and Web Application Proxy.

Getting Started with AD FS 2012 R2
http://technet.microsoft.com/en-us/library/dn452410.aspx

Overview: Connect to Applications and Services from Anywhere with Web Application Proxy
http://technet.microsoft.com/en-us/library/dn280942.aspx

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

RS, SharePoint and Forefront UAG Series – Intro

April 23, 2014, 12:51 pm

≫ Next: Version 9.04.0013 of the RML Utilities for x86 and x64 has been released to the download center

≪ Previous: RS, SharePoint and Forefront UAG Series – Part 2 (Operational Reports)

I have been working a fairly complicated case over the last year that involved running Reporting Services Reports through a Universal Access Gateway (UAG) to publish a SharePoint Site. I worked with an engineer (Alejandro Lopez) in the SharePoint Support Group as well as two folks (Prateek Gaur and Billy Price) in the Security Team along with an individual (Ben Satzger) on the Reporting Services Product Team. This will be a series of 4 different posts. This post, the first post, will be a high level description of what the problem was along with a highlight of the environment that we used to reproduce the issue locally. The subsequent posts will go into the three different main issues that we worked through.

Part 1 – Intro (you are here)
Part 2 – Operational Reports (Classic RDL Reports)
Part 3 – Power Pivot Gallery (Silverlight)
Part 4 – Export a Power View Report to PowerPoint

The Problems

Operational Reports (Classic RDL reports)

The original issue was that when the customer went to run a report in this environment, from an external client machine, the reports would get into a loop.

Power Pivot Gallery/Power View Reports (Silverlight)

We then had issues with getting the Power Pivot Gallery Library to load as well as hitting issues whenever we tried to run a Power View Report. I group these together as the issue was really specific to Silverlight and not those individual elements.

Export a Power View Report to PowerPoint

This was an issue with the Silverlight ActiveX control within PowerPoint. This is used when exporting a Power View Report to PowerPoint.

Environment

This was a complex setup that involved 8 or so VMs. The environment was rebuilt once and in that rebuild it was scaled down a little bit. Here is the diagram for the original environment we had put together to reproduce the customer’s issues.

NOTE: I am not an expert in WAP (see below) or UAG. The Security engineers configured the environment for me to get it up and running.

This involved a private domain environment, a SharePoint 2013 Farm (consisting of 2 SharePoint boxes), a SQL Server, a Server for the PowerPivot instance of SSAS, the UAG server, an ADFS Server and a Client machine that was in a different subnet and not joined to a domain.

Web Application Proxy (WAP)

We also looked at this from a Web Application Proxy (WAP) perspective as an alternative to the UAG setup. This requires a Windows 2012 R2 server. We used the SQL and Power Pivot Servers for this, giving them a little double duty. Of note, this deployment was much cleaner than the UAG deployment, and caused for much less issues. Of the problems noted above, the only issue that surfaced with the WAP deployment was the last one (Export to PowerPoint). If you are looking at doing a UAG deployment and using Reporting Services, I would highly recommend you looking to see if a WAP deployment is doable for you. I will call out why WAP was a better fit here in the following Blog Posts. Here is some information about WAP.

Web Application Proxy (WAP) Information:

Working with Web Application Proxy
http://technet.microsoft.com/en-us/library/dn584107.aspx

Installing and Configuring Web Application Proxy for Publishing Internal Applications
http://technet.microsoft.com/en-us/library/dn383650.aspx

Plan to Publish Applications through Web Application Proxy
http://technet.microsoft.com/en-us/library/dn383660.aspx

Step 3: Plan to Publish Applications using AD FS Preauthentication
http://technet.microsoft.com/en-us/library/dn383641.aspx#BKMK_3_2

These TechNet articles include links to a complete walk-through guide to deploy a lab or POC environment with AD FS 2012 R2 and Web Application Proxy.

Getting Started with AD FS 2012 R2
http://technet.microsoft.com/en-us/library/dn452410.aspx

Overview: Connect to Applications and Services from Anywhere with Web Application Proxy
http://technet.microsoft.com/en-us/library/dn280942.aspx

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

Version 9.04.0013 of the RML Utilities for x86 and x64 has been released to the download center

April 24, 2014, 2:17 pm

≫ Next: SharePoint Adventures : Using Claims with Reporting Services

≪ Previous: RS, SharePoint and Forefront UAG Series – Intro

X64: http://download.microsoft.com/download/0/a/4/0a41538e-2d57-40ff-ae85-ec4459f7cdaa/RMLSetup_AMD64.msi

X86: http://download.microsoft.com/download/4/6/a/46a3217e-f523-4cc6-96e9-df73dd0fdd04/RMLSetup_X86.msi

This build encompasses previous features, fixes and enhancements designed from recent case work.

· SQL Server 2014 compliant

· SQL Server 2012 and 2014 XEL input to ReadTrace compliant (sample .XEL capture script shipped with help documentation) no need for .TRC anymore. (PSSDiag updates align with XEL capabilities as well)

· Microsoft Azure SQL Server Database (formally WASD) connectivity compliant

· OStress true replay of MARS connections

· Addition of client (OStress Expressions) allowing client side Sleep, Repeats, Identity and Guid generation to craft additional scenarios

· Tuned for larger memory and cpu systems

· Updated compression libraries for formats such as RAR5

· Enhanced TVP capabilities

· Custom CLR Expression capabilities

· Additional filtering options

· Workarounds for some .XEL capture bugs such as invalid packet size captured in Existing Connection events

· … and more …

↧

SharePoint Adventures : Using Claims with Reporting Services

April 28, 2014, 9:17 am

≫ Next: How to grab multiple parent/child elements from XML Data Source

≪ Previous: Version 9.04.0013 of the RML Utilities for x86 and x64 has been released to the download center

Back in February of 2011, I created a blog that walked through using Kerberos with Reporting Services. Since then, we have moved Reporting Services to a shared service within SharePoint. This changes the game and we are now in the Claims world. I’ve been asked a bunch of times regarding Claims Configuration, and just clearing up some general confusion. I have also presented at PASS on this topic as well, and thought it was time to get the Blog post out there on this topic. This blog will show SharePoint 2013, but the steps are the same in SharePoint 2010. To start, I’ll reference a few other blogs for background that we can refer back to.

Reference Blogs:

My Kerberos Checklist…
What SPN do I use and how does it get there?
SharePoint Adventures : How to identify if you are using Claims Authentication

Isn’t Kerberos Dead?

I’ve heard some comments along the lines of – “Well now that I’m using claims, I don’t need to worry about Kerberos.” This isn’t true. Claims changes the perspective a bit, but if our goal is to get to a back end data source using Windows Authentication, we need Kerberos. Within the Claims/SharePoint Bubble, we don’t have a Windows Token. We have a Claims Token. When we want to leave the bubble, we need to go get a Windows Token. This is done by way of the Claims to Windows Token Service (C2WTS). From there it is all Kerberos. So, everything you know about Kerberos is still relevant. We just need to add a few things to your utility belt.

Shared Service

Starting with Reporting Services 2012, we are now a Shared Service within SharePoint. We are no longer an external service as we were with RS 2008 R2 and earlier versions. This means we are inside of the SharePoint bubble. In the Using Kerberos with Reporting Services, I talk a lot about needing to have the front end SPN’s (HTTP) in place. However, now that we are inside of the SharePoint bubble, we don’t need the HTTP SPN’s any longer. Everything from the client (Browser) to the Report Server, does not require Kerberos any longer. You can still setup Kerberos for the SharePoint Web Front End (WFE), but when we go to hit the Report Server, it will be Claims. Any communication with the Report Server is done via a WCF Web Service and will be Claims Auth. Regardless of how the WFE is configured. So, in this setup, we really only care about the RS Service and going back into the backend. It’s all about Delegation now.

Common Errors

Before getting into the configuration, I wanted to highlight some of the errors you may see that are related to this topic. These are at least ones I’ve seen.

Cannot convert claims identity to a windows token. This may be due to user not logging in using windows credentials.

Login failed for user ‘NT AUTHORITY\ANONYMOUS’

Could not load file or assembly ‘System.EnterpriseServices, Version=2.0.0.0, culture=neutral, <-- see this blog post

Claims to Windows Token Service (C2WTS)

This is where the magic happens. As mentioned above, we are using a Claims Token when we are within the RS Shared Service. We are in the SharePoint bubble. So, what happens when we want to leave the bubble? We need a helper. This helper is the Claims to Windows Token Service (C2WTS). It’s whole purpose in life is to extract the User Principal Name (UPN) claim from a non-Windows security token, in our case a SAML token, and generates an impersonation-level Windows Token. Think Kerberos Delegation. This is actually a Windows Service that sits on the same machine as the service that is trying to call into it to get the Windows Token.

This service is enabled via Central Admin –> Application Management –> Service Applications –> Manage services on server.

Be sure to start it here as opposed to the Windows Service directly. The SharePoint Timer jobs will just stop the service if you start it manually.

C2WTS Configuration

There are a few things that need to make sure that you configure C2WTS correctly. We will have a look at everything except for the delegation piece. We will save that for last.

Service Account

You will need to decide what Service Account you want to use. By default, C2WTS is set to use the Local System account. I’ve seen people use this, and it will work fine. However, I usually don’t ever recommend you use Local System for any Service Account. This is just a security standpoint, and the ideal of least privileged. Local System has a lot of power on the machine. So, I typically recommend a Domain Account to use. On my deployment, I use a Claims Service account that I created. If you use an account you created, you will need to add it as a managed account within Central Admin. This is done via Security –> General Security –> Configure managed accounts.

After that is done, you need to change the C2WTS service to use that managed account. This is done via Security –> General Security –> Configure service accounts. Then select C2WTS from the drop down.

When you do this second step, it should also add the service account to the WSS_WPG local group on the SharePoint boxes.

Local Admin Group

You will need to add this service account to the Local Admin Group on the machine that it will be used on. If you have two SharePoint boxes and one is a WFE and the other is the App Server that will be using it, you only need to do this on the App Server. C2WTS will not work unless it is in the local admin group. I haven’t narrowed down what exact permissions it requires to avoid the local admin group. If someone has figured this out, let me know.

Local Security Policy

The service account you are using needs to be listed in the Act as part of the operating system policy right. Again, this only needs to be done on the SharePoint box that will be using the service.

c2wtshost.exe.config

Remember the WSS_WPG group? This is why we want the service account in that group. The location of this config file is C:\Program Files\Windows Identity Foundation\v3.5. In this config file will be defined who can make use of C2WTS. If your account isn’t listed here, or covered by a group that is listed, it won’t work.

<allowedCallers>
<clear />
<add value="WSS_WPG" />
</allowedCallers>

RS Shared Service Configuration

The only real configuration point here is with regards to the service account. Again, I would recommend a Domain Account for use with this. In my deployment, my account is rsservice. We will need to make sure that account is added as a managed account within SharePoint (see above under the claims account). Once that is done, it will be added to the local WSS_WPG group. The addition into the WSS_WPG group allows for the RS Service to call into C2WTS because that group is set in the config file.

We then need to associate that account to the RS Service, if you didn’t already do that during initial configuration of the RS Service.

Delegation

The last part on our journey is configuring delegation. Remember we mentioned that we don’t care about the front end piece of this. So, we don’t need to be concerned with HTTP SPNs at all. We just want to configure delegation from the point of C2WTS and the RS Service. These both need to be configured in order for this to work. They need to match with regards to which service you want to hit. I would start with the RS Service Account, and then make sure that the C2WTS account matches what the RS Service Account has.

NOTE: The C2WTS service may have other services configured that RS doesn’t need. This could be due to other services making use of C2WTS such as Excel Services or PerformancePoint.

To configure this, we need to go into Active Directory Users and Computers. There are other ways to configure delegation, but this is probably the easiest. Ut Oh! Where’s the delegation tab? The delegation tab will only show up if there is an SPN configured on that account. But, we said we didn’t need the HTTP SPN that we would have with RS 2008 R2. As a result, nothing was configured on the RS Service Account and we don’t see the delegation tab. What’s the fix? Add a fake SPN.

Here you can see I added an SPN called my/spn. This won’t hurt anything and won’t otherwise be used.

For this to work, we need to choose the settings for Constrained Delegation. More specifically we need to enable Protocol Transitioning (Use any authentication protocol). This is because we are transitioning from one authentication scheme (Claims) to another (Windows Token). This also has the adverse effect of limiting you to a single domain for your services and computer accounts. This has changed starting in Windows 2012 R2, but I haven’t tested that yet to see how it works. I’ve read that you can do cross domain traffic with constrained delegation in Windows 2012 R2.

After that, I add the service that I want to delegate to. Basically, what data sources are you hitting with your reports. In this case, I added my SQL Server. This assumes you have your SQL SPN in place. You can reference the other blog posts at the top of this blog if you need assistance getting your SQL SPN configured.

We then need to make sure that the Claims Service matches this configuration. Don’t forget the fake spn on the Claims service account.

And that’s it! After that, we should see that our data source works.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

How to grab multiple parent/child elements from XML Data Source

April 29, 2014, 6:51 am

≫ Next: How It Works: Behavior of a 1 Trillion Row Index Build (Gather Streams from SORT)

≪ Previous: SharePoint Adventures : Using Claims with Reporting Services

We had a question come up about why an XML query wasn’t pulling a 2nd Parent/Child element from the resulting data. This was grabbing data off of a Web Service. The example I was given was a public Web Service call for getting Weather information.

http://wsf.cdyne.com/WeatherWS/Weather.asmx?op=GetCityForecastByZIP

The data that is returned looks like the following:

<ForecastReturn xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://ws.cdyne.com/WeatherWS/">
<Success>true</Success>
<ResponseText>City Found</ResponseText>
<State>TX</State>
<City>Keller</City>
<WeatherStationCity>Euless</WeatherStationCity>
<ForecastResult>
    <Forecast>
      <Date>2014-04-22T00:00:00</Date>
      <WeatherID>17</WeatherID>
      <Desciption>Drizzle</Desciption>
      <Temperatures>
        <MorningLow>64</MorningLow>
        <DaytimeHigh>89</DaytimeHigh>
      </Temperatures>
      <ProbabilityOfPrecipiation>
        <Nighttime>20</Nighttime>
        <Daytime>10</Daytime>
      </ProbabilityOfPrecipiation>
    </Forecast>

The problem we were seeing is that when we grab the initial query, we don’t see the values under ProbabilityOfPrecipiation (yes I know there is a typo there).

We only see the items under Temperatures (MorningLow & DaytimeHigh). The query that was being used is the following:

<Query>
<Method Name="GetCityForecastByZIP" Namespace="http://ws.cdyne.com/WeatherWS/">
    <Parameters>
    <Parameter Name="ZIP" type="string">
           <DefaultValue>76244</DefaultValue>
     </Parameter>
    </Parameters>
</Method>
<SoapAction>
   http://ws.cdyne.com/WeatherWS/GetCityForecastByZIP
</SoapAction>
<ElementPath IgnoreNamespaces="true">*</ElementPath>
</Query>

You’ll notice that ElementPath is using *. Using * is the same as leaving ElementPath blank, and will cause the query to use the default element path. The first path to a leaf node collection, which would be Temperatures in this case and be something like ForecastReturn/ForecastResult/Forecast/Temperatures. As a result, we don’t see the values in ProbabilityOfPrecipiation. The following MSDN document outlines this as well. It’s with regards to RS 2005, but still applicable.

Reporting Services: Using XML and Web Service Data Sources
http://technet.microsoft.com/en-US/library/aa964129(v=SQL.90).aspx
With regards to Auto-Detection of XML Structure:
Multiple parent-child hierarchies are not supported. In this example, Customer has both Orders and Returns. The provider may return only one set. Because the Orders hierarchy is specified first, auto-derivation will resolve it as the skeleton.

With a single query, you can’t grab multiple Parent-Child elements. You can grab it with a second query and then use the Lookup function to pull related data. I created a second query with the following syntax:

<Query>
<Method Name="GetCityForecastByZIP" Namespace="http://ws.cdyne.com/WeatherWS/">
    <Parameters>
    <Parameter Name="ZIP" type="string">
           <DefaultValue>76244</DefaultValue>
     </Parameter>
    </Parameters>
</Method>
<SoapAction>
   http://ws.cdyne.com/WeatherWS/GetCityForecastByZIP
</SoapAction>
<ElementPath IgnoreNamespaces="true">GetCityForecastByZIPResponse{}/GetCityForecastByZIPResult{}/ForecastResult{}/Forecast/ProbabilityOfPrecipiation</ElementPath>
</Query>

You may be asking where ForecastReturn is, as that is the root element. For the Web Service call, you need to put the headers for the actual method call itself, which is GetCityForecastByZIP. As a result, we have to use GetCityForecastByZIPResponse and then GetCityForecastByZIPResult. You’ll also notice the {}. This is to indicate I don’t want the values from those elements. This is what we see for the field list.

Now we have two queries and Date is the unique item here. So, we can do a lookup on the Date across the two DataSets. I’ll start the main table off of the first DataSet. Then add two columns to the Table with the following Expressions:

Daytime Probability
=Lookup(Fields!Date.Value, Fields!Date.Value, Fields!Daytime.Value, "Precipitation")
Nighttime Probability
=Lookup(Fields!Date.Value, Fields!Date.Value, Fields!Nighttime.Value, "Precipitation")

The result is the output that we originally wanted.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

How It Works: Behavior of a 1 Trillion Row Index Build (Gather Streams from SORT)

April 29, 2014, 7:07 am

≫ Next: Capping CPU using Resource Governor – The Concurrency Mathematics

≪ Previous: How to grab multiple parent/child elements from XML Data Source

I ran into this behavior working on a 1 trillion row, spatial index build but the behavior can apply to any Gather Streams operator, retaining the sort order as rows pass though it. I was just surprised a bit by the behavior until I dug deeper to understand.

The index was taking just short of 2 hours to build on my 64 way, 128 GB RAM test system. The behavior I observed was the drop in CPU usage and parallelism. The first ~40 minutes all 64 CPUs are using 100% of the CPU but the last 1 hour and 10 minutes of the index build, only the controlling worker consumes 100% CPU on a single processor.

Digging into this I found the top of the plan looked like the following. Node 1 being the final, SORTED insert, from the gather streams activity.

During the index build the lower part of the plan was executing the parallel, nested loop behavior on all 64 CPUs and building the large sort groupings, in parallel, on each of the child workers (first 40 minutes.) Once all the rows are sorted, per worker, the gather streams activity has to merge the 64 individual sorts into the overall sort order as it performs the final inserts.

This is deemed an order preserving gather (merge) operation. The consumer pulls a row from each worker and keeps an in-memory tree. In the case of the 64 parallel workers the tree would have 64 entries. If MAX DOP is 16 the tree would contain 16 entries. The tree is maintained in sorted order so the pattern of execution will look like the following on a 4 processor system.

Get Row From Worker/Partition #1 – Insert into tree
Get Row From Worker/Partition #2 – Insert into tree
Get Row From Worker/Partition #3 – Insert into tree
Get Row From Worker/Partition #4 – Insert into tree
While (entries in tree)
{
Output lowest, sorted value from tree
Get Row from Worker/Partition you just removed from tree as lowest value
}

This design keeps the tree pruned equal to or just above the maximum, sub-process workers. The consumer performs the merging of the individual, parallel, sort operations and inserts the results into the index as requested. The serialization of the final sort order is what I am seeing during the final phase of my index build.

Since my table and index all fit into memory on this large system the activity is taking place in memory and leveraging the CPU fully. Looking at the details in sys.dm_exec_requests and sys.dm_os_waiting_tasks I can see the gather streams activity, associated with Node 1 in the plan, is driving the last ~01:10:00 on a single CPU. In fact, setting processor affinity you will observe the controlling workers’ CPU light up for the final 01:10:00 of the index build.

In observing the behavior the sys.dm_os_wait_stats shows a sharp increase in CXPacket waits and wait times. This is expected as the final thread is going to be pulling the data from 64 workers that ran at 100% CPU already, so it is unlikely a single CPU can process the data as fast and the CXPacket exchange activity will encounter waits.

The description for the wait indicates a wait at node 1 asking for a new row from the producers.

exchangeEvent id=Pipe307f0be530 WaitType=e_waitPipeNewRownodeId=1

There are lots of documents and posts about the best setting to optimize parallel plans. Some state MAX DOP = 8, others MAX DOP = 32 and both are correct depending on the type of plan (actual query patterns) as the performance may vary. Since this was a create index I decided to do some experiments with the DOP level.

64 CPUs = 01:50:00
32 CPUs = 02:17:00
16 CPUs = 03:16:00

What I observed is that for this specific create index (spatial) the lower the MAX DOP the larger the change in the original 40 minute part of the plan. This is what I expected to see. The first part of the plan is already CPU bound so adding more CPU resources lets that part of the plan execute faster.

The CXPacket waits on the final portion of the plan don’t change significantly with different DOP levels. The time remains generally steady at 01:10:00.

This was a unique pattern because the controlling worker was not showing additional waits (usually the I/O will show up but because everything fit into memory the CPU was the dominant resource.) The controlling worker only showed normal scheduler yield activities.

What I found was it takes 01:10:00 on my 1.88Ghz CPU to sort (merge) 1 trillion rows onto the index pages. If I want to reduce the final portion of the plan I would need to move to a faster CPU. SQL Server did use parallel resources as much as possible to build the index.

Bob Dorr - Principal SQL Server Escalation Engineer

↧

Capping CPU using Resource Governor – The Concurrency Mathematics

May 21, 2014, 10:44 am

≫ Next: REPL_SCHEMA_ACCESS wait type

≪ Previous: How It Works: Behavior of a 1 Trillion Row Index Build (Gather Streams from SORT)

Here is what you need to know: A = πr²

Okay, not really as that is the formula for the area of a circle but it does set the stage for this discussion. I have been working with the CAP_CPU_PERCENT (RESOURCE POOL) setting as it relates to concurrency. This turned into a mathematical exercise I was not planning on.

You all have had that one user who keeps running their ‘special report’, they just have to have, in the middle of the day. No matter how many times you have asked them to stop doing this until after hours they continue to ignore you. So one day you decide that you will put them in their own resource pool and workload group and cap their CPU to a small percentage and MAX DOP = 1. This way they quit impacting the overall production server. Shortly after you do this you discover you may have made matters worse. How could that have happened?

Simply put, capping the CPU means you forced a wait and when you force a wait where shared state is involved you impact all entities attempting to use the shared resource. I created a pathological example but it shows the behavior very well.

User 1: Default Pool 100% CPU and Full DOP
User 2: Limited to 1% CPU CAP and MAX DOP = 1

I have them both do a select with UPDLOCK on the same table. I created an execution pattern of User 1, User 2, User 1, User 2, … acquiring and releasing the lock to each other.

begintran
selectcount(*)fromdbRG..tblRGwith (UPDLOCK) -- Only 1 row in the table
committran

I then used OStress to run the batch in a tight loop in various configurations. The diagram below shows the basics of the CPU activity.

1. User 1 has ownership of the lock and owns the scheduler, using the CPU while User 2 is blocked waiting on the lock. Standard blocking scenario where User 2 is not taking any CPU, it is just placed on a waiter list of the lock.
2. User 1 completes the transaction and releases the lock. During release User 2 is granted ownership and placed on the runnable list of the scheduler. User 1 then yields to the scheduler because it has used its quantum. Placing User 1 at the tail of the runnable queue.
3. User 2 yields and User 1 attempts to access the lock. If User 2 has cleared the lock the lock can be granted. If User 2 still owns the lock User 1 will be blocked and added to the lock’s wait list.

The ‘Delay’ is the interesting part of this scenario. When User 1 grants User 2 ownership of the lock, User 2 becomes the next runnable worker on the scheduler. However, the resource pool for User 2 is set to cap the CPU percentage. This means SQL Server will delay User 2’s execution to keep it at the configured cap. Even if during the delay User 1 is allowed to execute User 1 simply becomes blocked on the shared resource and does not make meaningful forward process.

What just happened is that by limiting the CPU cap the shared resource (lock in this example for the same row) it results in limiting the overall resource acquire and release frequency.

Now let’s get to some of the math behind this I promised you. It won’t be all that difficult, you’ll see, I will use some nice, round numbers.

Assume each transaction takes 1ms or 1000 transactions per second if utilizing the CPU at 100%. If you cap the CPU at 10% the math is 1000 * 0.10 = 100 transactions/sec. Meaning the user in the 10% CPU CAPPED pool should only be able to execute the transaction 100 times to the 1000 times the uncapped user can execute.

When I combine the two users the 10% CAP introduces the delay and causes the conflict, lowering the transaction rate near the 100 mark for both users combined.

Here are some actual numbers from my laptop running the pathological, tight loop on the same row.

Transactions/Sec	User Stress Connections
192	1 – 1% CPU CAPPED User
435	1 – 100% CPU User
240	1 – 1% CPU CAPPED User 1 – 100% CPU User
920	2 – 100% CPU Users
1125	1 – 1% CPU CAPPED User 2 – 100% CPU Users

Most of the time you won’t even notice the impact I am describing in this post. As you can see the 1125 transactions/sec level is achieved by 2 – 100% users and 1 – 1% user. Back to the math the 1% user is only 1/3 of the workers on the scheduler. Each of the 2 – 100% users get full quantum so the more workers the more this scenario becomes a standard blocking issue as if you had a slow client, for example. You just have to figure out how to reduce the shared state interaction behavior(s) and things will run smoothly, as you expect them to.

I was looking at this in more detail and I noticed I was not accumulating large numbers for LCK_M_U waits, wait time or signal time. What I found is that the execution quantum is generally such that the lock can be acquired and released before the next user is allowed to execute. In my attempt to create the pathological case I tuned away some of the blocking aspects that would make it worse. If I add more rows to my table I can get into the ultimate pathological I go, you go, I go, you go, … scenario.

Instead it was enlightening that the delay that was necessary to control the 1% user introduces overall delay at the CPU resource. The CPU resource became my shared entity and when the 1% user exceeded its CPU target the scheduler may need to force the delay and in doing so other users on the same scheduler can become impacted. The more users I added to the testing the less, forced delay required.

While I believe this will be an edge case scenario and unlikely that you will encounter this I wanted to share this so you could put it in your toolbox. We are always looking for the connection that is holding a resource (usually a lock) and not responding on the client fast enough. Instead, you could introduce some of the delay attempting to isolate a user like this.

Bob Dorr - Principal SQL Server Escalation Engineer

↧

REPL_SCHEMA_ACCESS wait type

June 3, 2014, 6:09 pm

≫ Next: Using SQL Server in Microsoft Azure Virtual Machine? Then you need to read this…

≪ Previous: Capping CPU using Resource Governor – The Concurrency Mathematics

Recently we have worked with a customer on replication latency issue with transactional replication. Customer has over 30 published databases on a single server. All of them are very active. Periodically, they will see up to 30 minutes latency from publisher to distributor. When they see waits on REPL_SCHEMA_ACCESS. Below is a sample screenshot of sys.dm_exec_requests during problem period.

What does this wait type mean?

Our current online document states "Occurs during synchronization of replication schema version information. This state exists when DDL statements are executed on the replicated object, and when the log reader builds or consumes versioned schema based on DDL occurrence." But this wait type is also used by synchronize memory access to prevent multiple log reader to corrupt internal structures on the publishers. Each time log reader agent runs sp_replcmds, it needs to access memory buffer. If it results in growing the buffer, the action needs to be synchronized among log reader agents with REPL_SCHEMA_ACCESS.

Contention can be seen on this wait type if you have many published databases on a single publisher with transactional replication and the published databases are very active.

Troubleshooting and reducing contention on REPL_SCHEMA_ACCESS waits?

This issue is dependent on number of log reader agents accessing the same publisher and transaction rate. If you have a single log reader agent access the publisher, you shouldn't see this type of contention.

In general, you can watch transactions/sec performance counter for all published databases to measure how active your system is. The higher your transaction rate is, the more likely you hit the issue assuming you have multiple log reader agents accessing the same publisher.

We charted the waits on REPL_SCHEMA_ACCESS and transaction/sec for multiple published databases. We saw a very clear correlation.

SQL Nexus report

Transaction/sec for one example published database

Here are a few things you can do to reduce contention:

Do not use large transactions. Large transaction that results in many commands can make the situation worse because of higher requirement of memory buffer. If you do have large transactions, experiment MaxCmdsIn value.
Try to spread out transactions among different published database to different time. for example, if you have batch load jobs for different databases, don't schedule them at the same time
Reduce number of log readers. In this customer, they have over 35 published databases on the same publisher and they are all active. The wait is at server level. So if you split your published databases into two different instances (even on the same hardware), contention can be reduced
Experiment decreasing –PollingInterval for your log reader agent. The default is 5 seconds. If you reduce the PollingInterval. This will allow log reader agent to catch up more frequently once the wait gets cleared.

Jack Li

Senior Escalation Engineer | Microsoft SQL Server Support

↧

Using SQL Server in Microsoft Azure Virtual Machine? Then you need to read this…

June 12, 2014, 2:41 pm

≫ Next: These resources may help resolve your issue….

≪ Previous: REPL_SCHEMA_ACCESS wait type

Over the past few months we noticed some of our customers struggling with optimizing performance when running SQL Server in a Microsoft Azure Virtual Machine, specifically around the topic of I/O Performance.

We researched this problem further, did a bunch of testing, and discussed the topic at length among several of us in CSS, the SQL Server Product team, the Azure Customer Advisory Team (CAT), and the Azure Storage team.

Based on that research, we have revised some of the guidelines and best practices on how to best configure SQL Server in this environment. You can find this collective advice which includes a quick “checklist” at this location on the web:

http://msdn.microsoft.com/en-us/library/azure/dn133149.aspx

If you are running SQL Server already in Microsoft Azure Virtual Machine or making plans to do so, I highly encourage you to read over these guidelines and best practices.

There is other great advice in our documentation that covers more than just Performance Considerations. You can find all of these at this location:

http://msdn.microsoft.com/en-us/library/azure/jj823132.aspx

If you deploy any of these recommendations and find they are not useful, cause you problems. or are not effective, I want to hear from you. Please contact me at bobward@microsoft.com with your experiences

Bob Ward
Microsoft

↧

These resources may help resolve your issue….

June 19, 2014, 11:09 am

≫ Next: Kerberos Configuration Manager updated for Analysis Services and SQL 2014

≪ Previous: Using SQL Server in Microsoft Azure Virtual Machine? Then you need to read this…

Some of you may have seen a list of links that pop up when opening a case through our Microsoft support site. These resources are internally referred to as solution assets and are meant to be top bets for the problem category and are aimed at helping you solve the problems on your own.

We keep these updated on a regular basis especially for the top problem categories. The content is curated with help of our experts and based on support incident data. Currently these resources are only shown during case creation process except for the top 5 categories. For the top 5 categories these are also shared and maintained at the following blog:

· Top Support Solutions for Microsoft SQL Server 2012

· Top Support Solutions for SQL Server 2008

Note: During case creation process only the top 3 are shown by default and the rest can be seen by clicking Show More. The blog links above does show the expanded list of these resources for top 5 categories.

For example here are the top solutions for various AlwaysOn problems:

KB articles:

1. Troubleshooting AlwaysOn availability databases in a "recovery pending" or "suspect" state in SQL Server 2012

2. Troubleshooting automatic failover problems in SQL Server 2012 AlwaysOn environments

3. Troubleshooting AlwaysOn availability group listener creation in SQL Server 2012

4. Time-out error and you cannot connect to a SQL Server 2012 AlwaysOn availability group listener in a multi-subnet environment

5. "General Network error," "Communication link failure," or "A transport-level error" message when an application connects to SQL Server

6. Cannot create a high-availability group in Microsoft SQL Server 2012

7. Voting nodes are not available when you try to use AlwaysOn availability groups in SQL Server 2012

8. How to restore a replica to the primary role after the quorum is lost in SQL Server 2012

9. You experience slow synchronization between primary and secondary replicas in SQL Server 2012

Blog posts

1. Create Listener Fails with Message 'The WSFC cluster could not bring the Network Name resource online'

2. SQL Server 2012 - True Black Box Recorder - CSS SQL Server ...

3. Connecting to Availability Group Listener in Hybrid IT

Books Online

· AlwaysOn Availability Groups Troubleshooting and Monitoring Guide

Team blogs

1. Always On support team’s blog

2. SQL AlwaysOn Product Team’s Blog

Forums:

· SQL Server High availability and Disaster Recovery forum

As always please share any feedback you may have around these links. You can also refer to the following links for additional information and for finding more top solutions for other SQL topics and other Microsoft products:

· Top Solutions from Microsoft Support

· Microsoft Top Solutions app for Windows 8

Ramu Konidena
SQL Server Support

↧

Kerberos Configuration Manager updated for Analysis Services and SQL 2014

June 26, 2014, 8:48 am

≫ Next: Getting Cross Domain Kerberos and Delegation working with SSIS Package

≪ Previous: These resources may help resolve your issue….

Kerberos Configuration Manager was released back in May of 2013. It initially released with only SQL Server support. This was followed up with support for Reporting Services in November 2013. You can download the latest release from the following link:

Microsoft® Kerberos Configuration Manager for SQL Server®
http://www.microsoft.com/en-us/download/details.aspx?id=39046

This month we have released Version 3 of the Kerberos Configuration Manager which provides support for Analysis Services. This will work with Analysis Services 2005 and later (including 2014).

This release also includes support for SQL 2014 services.

Logging

If you happen to encounter something that I didn’t highlight above, you may be able to find additional information. Each time you run the tool, we will create a log file. The default location for this is the following: C:\Users\<user>\AppData\Roaming\Microsoft\KerberosConfigMgr.

The details of the log file will be flushed when you close the program. So, if it is blank, just close the tool and the log should populate. You may also find some details in the Event Log under the Source “Kerberos Configuration Manager”. If we encounter an error, it should be logged in the Application Event Log as well as the tool’s log file.

Limitations

Delegation

For Analysis Services, there is no Delegation check at this time. The scenarios for that are limited and may be looked at in a future release.

SQL Browser DISCO SPN

This also does not do a validation on the SQL Browser SPN.

Multiple Domains

Right now, the tool will only work in a single domain scenario. So, if you have the service installed in Domain A, but want to use a Service Account from Domain B, we won’t be able to discover and correct the issue appropriately. As long as the machine the instance is in and the Service Account are in the same domain, you should be good to go. This is true for Reporting Services and the SQL Server discovery.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

Getting Cross Domain Kerberos and Delegation working with SSIS Package

June 26, 2014, 3:04 pm

≫ Next: Slow query using non-deterministic user defined function

≪ Previous: Kerberos Configuration Manager updated for Analysis Services and SQL 2014

I started working on this issue started by way of a Twitter conversation between myself, Jorge Segarra (@sqlchicken) and Matt Masson (@mattmasson). I then found out that Jorge was working with a customer that had opened up a Support Case. Also special thanks to Joey Seifert for letting me bounce some Active Directory questions off of him.

The issue we were troubleshooting was that when running an SSIS 2012/2014 Package from SQL Server A (Parent Domain) that connected to SQL Server B (Child Domain) it would fail with the dreaded “Login failed for user ‘NT AUTHORITY\ANONYMOUS’”

Illustration Credit: Svetlin Velinov

Whenever I see this error, I always start with configuration. That is typically the cause. Back in 2010 I wrote out my Kerberos Checklist for validating configuration. Jorge was aware of this and had gone through it. He also mentioned that they had ran the Kerberos Configuration Manager and it didn’t find any issues. Although in this scenario, it wouldn’t have as it doesn’t yet support cross domain topologies.

I was able to reproduce the issue they were seeing in a local environment on my end. Here is what it looked like.

So, I have two domains (battlestar.local & cylons.battlestar.local). The SQL Server in the Parent Domain (battlestar.local) is using a Service account from the child domain (cylons.battelstar.local). From a delegation standpoint, we are using full delegation. I’ll touch on Constrained Delegation later on. To make sure that everyone understand what I mean by full delegation, with the CYLONS\sqlservice AD Object, I have the following setting:

How SSIS 2012 and later work

When you run a package that is hosted in the SSIS Catalog, it will cause a child process to get spawned from the SQL Service itself. This process is the ISServerExec.exe.

The other thing to note is that this process context is the context of the session that launched the package, not the SQL Server Process Context (service account). Here you can see that the ISServerExec is running as BATTLESTAR\asaxton where as the SQL Service is running as CYLONS\sqlservice.

This is the case regardless of how you execute the package. This could be through the SSMS GUI, via Stored Procedure or even by way of DTExec. If you want it to run under the context of the SQL Service account, you can “fake it” by doing a runas like operation on a process (Command Prompt, SSMS or SQL Agent Job security account).

I initially thought that this was the cause of the problem, however I later found that it is not. While I haven’t dug fully into that, I believe this to be due to the way we are launching the child process. My guess is it has to do with handle inheritance in the child process.

The Single Hop

Before I even get into the SSIS package, I want to verify that the first leg of the journey is working. To do this, I connected with Management Studio from a remote machine to my SQL Server in the Parent Domain. I then used DMV’s to validated it I had connected via NTLM or Kerberos. Here is the query I used. depending on where you are doing this from, you may want to include a predicate to exclude the session you are running this from, if you are trying to find out a different sessions.

select c.session_id, s.login_name, c.auth_scheme, c.net_transport, st.text
from sys.dm_exec_connections c
JOIN sys.dm_exec_sessions s ON c.session_id = s.session_id
JOIN sys.dm_exec_requests r ON c.session_id = r.session_id
cross apply sys.dm_exec_sql_text(r.sql_handle) as st

This showed NTLM which is not what we want. I had enabled Kerberos Event Logging, and I saw the following:

Log Name:      System
Source:        Microsoft-Windows-Security-Kerberos
Date:          6/26/2014 11:22:42 AM
Event ID:      3
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      DrBaltar.battlestar.local
Description:
A Kerberos error message was received:
on logon session
Client Time:
Server Time: 16:22:42.0000 6/26/2014 Z
Error Code: 0x7 KDC_ERR_S_PRINCIPAL_UNKNOWN
Extended Error:
Client Realm:
Client Name:
Server Realm: BATTLESTAR.LOCAL
Server Name: MSSQLSvc/captthrace.home:49375
Target Name: MSSQLSvc/captthrace.home:49375@BATTLESTAR.LOCAL

captthrace.home? What is that? This is because I’m working from home. At the office I saw a different domain. If we do a Ping on Captthrace, we will see the same result. In this case, I have my machines multi homed. This was picking up the external nic and not the internal which should have had an IP Address of 10.0.0.10.

We want this to resolve to captthrace.battlestar.local [10.0.0.10]. On my end I can do that multiple ways. I could disable the external nic, add an entry in the HOSTS file. In my case I decided to update the DNS Search suffix for the internal adapter. Making sure that battlestar.local and cylons.battlestar.local were listed. After doing that we get a properly result.

That looks better. retrying the connection to my Parent SQL Server shows that we are connected with Kerberos now.

Now when I try a single hop to the Destination SQL Server, I was seeing Kerberos as well for a single hop.

The Double Hop

Now I wanted to run the package from the client machine off of the Parent SQL Server. When I did that, I got the error.

Looking at the the Parent SQL Server I saw the same issue as we had on the client box. So, I adjusted the DNS Suffixes on that machine as well. After that, the package connected successfully using Kerberos to the Destination server in the Child Domain.

Cross Domain SPN Lookups with Active Directory

One item I ran into the first time I was going through this was that I kept getting NTLM even though the name resolution was fine on the Parent SQL Server. It was using an account from the Child Domain though which had the SPN for the server. When Domains are within the same forest, the KDC should consult the GC (Global Catalog) and provide a referral if the account is in a different domain. If the account is not in the same forest you would need to define Host Mapping for the account, unless you are using a forest trust. Then you could define a Kerberos Forest Search Order.

What happened was that the Parent DC was not able to communicate with the Child DC. I discovered this when I tried to force Domain Replication. It errored out saying it couldn’t find the CYLONS domain. This could also lead to a failure as the SPN may not be noticed from the GC Perspective if replication wasn’t working. So, if you made a change in the Child Domain, the Parent Domain wouldn’t pick it up.

What about Constrained Delegation?

With the amount of work I do with SharePoint Integration, Constrained Delegation comes up a lot when we talk about Claims to Windows Tokens. This may force your environment to use Constrained Delegation. Before Windows 2012, this means that all Service Accounts and machines hosting the services all had to be in the same Domain. You were really restricted to one domain. Starting with Windows 2012, you can cross domain boundaries, but the configuration is different for Constrained Delegation from what it used to be. It is modified via PowerShell commands. If you want to read more about that, you can have a look at the following:

Kerberos Constrained Delegation Overview for Windows 2012
http://technet.microsoft.com/en-us/library/jj553400.aspx
How Windows Server 2012 Eases the Pain of Kerberos Constrained Delegation, Part 1
http://windowsitpro.com/security/how-windows-server-2012-eases-pain-kerberos-constrained-delegation-part-1

I did get this to work in my environment and will look to get some posts specific to how to get this to work in the future.

Takeaway – What’s in a name?

If you have verified your Kerberos configuration, be sure to validate your name resolution within your environment. It may not be resolving to the proper names. When we go to build out the SPN to use, we based on the DNS name that was resolved from the NETBIOS name. If DNS resolution isn’t working properly, then it can lead to all sorts of problems.

I’ve learned the hard way, over time, that DNS and Active Directory really blend together. If DNS has issues, then AD will more than likely have some issues. Hopefully this helps to at least show that it could be more than the normal Kerberos Configuration items that are causing an issue. Be sure to check out DNS Forwarders, Network Configuration (including Networking Binding/Order and DNS Suffix if needed).

A Ping command should return the proper name, or an NSLookup. If you have doubts, do an IPConfig /flushdns and try again. Verify the DC’s can talk/replicate to each other. As you can see from above, this should work for Full Delegation. Constrained Delegation would work with some modifications.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

Slow query using non-deterministic user defined function

July 8, 2014, 4:00 pm

≫ Next: Read this if you have transactional replication configured and plan to upgrade from SQL 2008/2008 R2 to SQL 2012/2014

≪ Previous: Getting Cross Domain Kerberos and Delegation working with SSIS Package

Recently we worked with a customer who reported a query that used to run a few seconds in SQL Server 2000 but it never finishes in SQL Server 2008 R2 following upgrade.

We went around and tried quite a few things but couldn't get SQL Server 2008 R2 to generate similar plan. Upon closer look at 2008 R2's query plan, we noticed something unusual. The plan has a warning "NO JOIN PREDICATE". What this means is that a cartesian product is introduced.

To illustrate the problem, let's use an example setup:

drop function dbo.myfunc
go
drop view v1, v2
go
drop table t1, t2
go
create table t1(c1 int not null, c2 varchar(100))
go
create table t2(c1 int not null, c2 varchar(100))
go
set nocount on
go
declare @i int
begin tran
select @i = 0
while (@i < 1000)
begin
insert into t1(c1, c2) values (@i, 'a')
insert into t2(c1, c2) values (@i, 'b')
select @i = @i + 1
end
commit tran
go
drop function dbo.myFunc
go
create function dbo.myfunc(@c1 int)
returns int
--with schemabinding
as
begin
return (@c1 * 100 )
end
go
create view v1 as select c1, c2, dbo.myfunc(c1) as c3 from t1
go
create view v2 as select c1, c2, dbo.myfunc(c1) as c3 from t2
go

Now, let's run the following query

dbccfreeproccache
go
setstatisticsprofileon
go

-- But by pulling UDF above join in this query we actually introduce a cartesian product (NO JOIN PREDICATE)
-- UDF is called 1 million times instead of 1000 times each for the two views!
selectcount(*)fromv1ast1joinv2ast2ont1.c3 = t2.c3
go
setstatisticsprofileoff
go

The above query is very slow as illustrated in the query plan below. In the line 6 for the query plan, there is a warning "no join predicate". The join resulted in 1,000,000 rows (1,000 x 1,000 rows from each table).

In line 5, the myfunc is called 2,000,000 times (1,000,000 for computing t1.c1 and 1,000,000 for t2.c1).

This is because starting SQL Server 2005, optimizer has rule changes that will disallow non-deterministic scalar functions to be 'pushed down' in some situations (like this one).

Solution

Many times, you can simply make a function deterministic by adding schemabinding option. In the above example, re-write the function with schemabinding, it will be much faster.

From the query plan, you will no longer see that the "NO JOIN PREDICATE". The scalare UDF is pushed down right after table scan and applied only 100 times on each table.

dropfunctiondbo.myFunc
go
createfunction dbo.myfunc(@c1int)
returnsint
withschemabinding
as
begin
return (@c1 * 100 )
end

Obviously, the function can be made deterministic. If you use following, the function will not be deterministic even you use schemabidning because of getdate(). In such cases, you will continue to see "NO JOIN PREDICATE" Cartesian product joins.

dropfunctiondbo.myFunc
go
createfunction dbo.myfunc(@c1int)
returnsint
withschemabinding
as
begin
return (@c1 * 100 *datepart(mm,getdate()))
end

Jack Li | Senior Escalation Engineer | Microsoft SQL Server Support

↧

Read this if you have transactional replication configured and plan to upgrade from SQL 2008/2008 R2 to SQL 2012/2014

July 16, 2014, 7:11 pm

≫ Next: How It Works: XEvent Output and Visualization

≪ Previous: Slow query using non-deterministic user defined function

SQL Server online documentation makes very clear that you need to 'drain' your replicated transactions before doing any upgrade if you have replicated databases. Below are requirements for transactional replication:

Make sure that the Log Reader Agent is running for the database. By default, the agent runs continuously.
Stop user activity on published tables.
Allow time for the Log Reader Agent to copy transactions to the distribution database, and then stop the agent.
Execute sp_replcmds to verify that all transactions have been processed. The result set from this procedure should be empty.
Execute sp_replflush to close the connection from sp_replcmds.
Perform the server upgrade to SQL Server 2012.
Restart SQL Server Agent and the Log Reader Agent if they do not start automatically after the upgrade.

A recent customer issue further confirms the need of following the steps before upgrade. The customer has transaction replication configured and they needed to upgrade from SQL Server 2008 to SQL Server 2012. This customer's publisher database was mirrored. They used 'rolling upgrade approach. They allowed publisher to continue to accept incoming transactions on primary server while upgrading the mirrored server. Then they failed over to the upgraded server and made it primary server. After that, they upgraded the original primary server. During the upgrade process, some transactions occurred on SQL Server 2008 but log reader didn't get a chance to read them and copy them to distribution. After the upgrade, they experienced the following error and engaged us.

Error 542 An invalid datetime value was encountered. Value exceeds the year 9999. (Source: MSSQLServer, Error number: 542)

Via some internal testing, we also reproduced additional error below if there are replicated transactions left prior to upgrade.

Error 515 Cannot insert the value NULL into column 'column1', table abc'; column does not allow nulls. UPDATE fails

Upon further investigation, we have discovered that SQL 2012 has some minor difference in terms how certain log records are handled. For those SQL 2008 log records (that weren't 'drained' before upgrade), Log reader agent made some incorrect assumptions and ended up reading the log records incorrectly. For the above errors 542 and 515, the underlying the issue is the same (reading some incorrect data). For 542, log reader agent was able to catch the fact that the value read was invalid for date. So log reader stopped processing the log record. So the error is raised by log reader agent. For 515, log reader didn't even know NULL is invalid for a non-NULL column. So log reader processed the log record and put it in distribution database. But this was caught by actually executing the update to subscriber and error was raised by distribution agent.

This further supports requirement documented above that you must ensure you 'drain' all replicated transaction before upgrade. However, it's undesirable to have log reader read incorrect values in such situation either. So our product team decides to fix this. Currently, we have a fix for SQL 2012. Fix for SQL 2014 is being built.

Solutions

Let's summarize the solutions here.

To avoid the problem entirely:

If you follow online documentation prior to upgrade to ensure no replicated transaction left on 2008 or 2008 R2, you won't experience the above issue.
If your situation doesn't allow you to stop accepting incoming transactions during upgrade (using mirror for rolling upgrade as an example), you should follow these steps:
1. First call our support to obtain the fix/patch for this issue
2. Prior to upgrade, disable log reader agent
3. Upgrade and then immediately apply the fix/patch you obtained from above step.
4. Enable and start log reader agent

If you have already upgraded and experience the errors (542, 515), the following are the options

You can re-initialize your replication and everything will be fine
If you only experience 542 error, you can obtain the fix/patch from us and new log reader will process the log record correctly.
If you see 515 error, you only option is to re-initialize. This is because log reader already incorrectly process the record and it can't go back and reprocess.

I want to point out this: The fact we are providing a fix for this particular situation doesn't mean that you should avoid the requirement of 'draining' replicated transactions per online documentation. You should still follow the upgrade requirement documentation.

Jack Li | Senior Escalation Engineer | Microsoft SQL Server Support

↧

How It Works: XEvent Output and Visualization

July 25, 2014, 11:54 am

≫ Next: The Additive Design of SSAS Role Security

≪ Previous: Read this if you have transactional replication configured and plan to upgrade from SQL 2008/2008 R2 to SQL 2012/2014

Each and every day I use XEvent more and more as I uncover the powerful feature set. I am finding it helpful to understand some of the input and output capabilities in order to leverage the power of XEvent.

Server File Output

When setting up an session to write to a file use per CPU partitioning (BY CPU) and increase the memory size to accommodate .5 to 4MB per CPU. The BY CPU configuration reduces contention on the output buffers as each CPU has private buffer(s).

Without going into all the details of how many active buffers can be achieved, the asynchronous I/O capabilities and such, the following shows the core performance gain when using per CPU partitioned output.

No partitioning: CPUs complete to insert events into same buffer.	CPU Partitioning: CPUs have private buffers

Visualization (Using the event data)

The per CPU partitioning is used to reduce the impact of tracing the production server. However, the divide and conquer activity for capture means you have to pay attention to the details when using looking at the data.

A buffer is flushed to the file when it becomes full or the data retention period is met.

This all boils down to the fact that you must sort the events by timestamp (better if you use Action: event_sequence) before you attempt to evaluate the events.

Take the following as an example. You have 2 connections to the SQL Server but the connection assigned to CPU 2 does more work than the connection assigned to CPU 1.

The connection on CPU 1 executes the first command and produces event_sequence = 1 into the local, storage buffer.
The connection to CPU 2 executes the second command and produces event_sequence = 2, 3, 4, 5, … into the local storage buffer.
CPU 2’s buffer is filled before CPU 1’s buffer and flushed to the XEvent file.

When you open the XEL file in management studio (SSMS) the order of events displayed is physical file order.

Sequence = 2
Sequence = 3
Sequence = 4
Sequence = 1

Unless you have a very keen eye you might not even notice the ordering of events. You must sort the data by event_sequence or timestamp in order to reflect the true nature of the capture. You will encounter the same need to add an order by clause if you import the trace into a SQL Server database table.

Dispatch Latency

Be very careful when setting dispatch latency. The latency controls how long an unfilled buffer can hold events before sending them to the output stream. Using a value of 0 is really INFINITE, not flush immediately.

Take the same example from above with a long dispatch latency (minutes, INFINITE, ….). Now assume that CPU 2 keeps generating events but the connection to CPU 1 remains unused. The buffer for CPU 1 is not flushed until the trace is stopped or dispatch latency exceeded. Here is what I recently encountered when someone used the INFINITE dispatch latency.

File	File time	Lowest Sequence in the file
1 (Trace started)	3:00am	100000
2	3:20am	200000
3	3:40am	300000
4	4:00am	400000
5 (Trace stopped)	4:11am	50000

The trace was captured with INFINITE dispatch latency. CPU # had very little event activity and didn’t flush the partial buffer until the trace was stopped. Other CPUs had activity and caused events to be flushed to the rollover files.

Looking for the problem at 3:10am I opened the XEL file covering the time window, 1 in this example. However, it was not until I included the 5th file into my research and sorted by the event_sequence that I saw the additional queries that occurred within the same time window and relevant to my investigation.

If I had only opened file 1 and grouped by Query Hash, ordering by CPU I would have missed the completed events in file 5.

CSS uses 5 to 30 seconds as a best practice for the dispatch latency setting, keeping the events near each other, across the CPU partitions.

SSMS runs in the Visual Studio 32 bit Shell

It is true that the SSMS, XEvent visualizations run in the Visual Studio 32 bit shell which limits the memory to a maximum of 4GB when running on an X64 WOW system. Does this mean I can only open or merge trace files up to 4GB in size?

The answer is NO you can generally exceed the trace file sizes by a 4:1 ratio. The design of the XEvent visualizer is to use the PublishedEvent’s EventLocator and owner draw cell activities. In some limited testing I observed use around ~250MB of memory for the events in a 1GB file. As you scroll around, filter, group, search, … we use the EventLocator(s) to access the data from disk and properly draw the cell or make a filtering, grouping, and search decision. You should be able to manipulate ~16GB of trace events using SSMS (your mileage may vary.)

Bob Dorr - Principal SQL Server Escalation Engineer

↧

The Additive Design of SSAS Role Security

July 30, 2014, 6:55 am

≫ Next: Having performance issues with table variables? SQL Server 2012 SP2 can help!

≪ Previous: How It Works: XEvent Output and Visualization

SSAS security roles are additive – that is, a user gets permission to access data allowed in any role to which the user belongs, even if another role membership for the same user does not allow access to the same data. This can cause confusion in some circumstances, and has been incorrectly reckoned a defect in the product in the past when discovered by administrators. The product group considered the question in crafting the existing design though, and opted to use additive security ultimately. The scenarios in which this problem occur are not as common as scenarios wherein additive security must be used to produce the expected result, and several workarounds exist to mitigate the issue.

The most common overlapping role security scenario

Consider the scenario where one role is applied to users for one geography, such as within North America, but another role is also applied to the same user for a different geography, which overlaps the North American geography. The user may be explicitly granted access to [All North America] in the role covering North American users, but in the second geography covering role, may contain only permission to some subset, say [USA], which does not cover other members like [Canada], which would be included in the [All North America] permissions from the first role. In this case, by using additive security the user will have access to [Canada], through the [All North America] permissions granted in the first role. The second role may only grant access to data for [USA], but [Canada] will not be denied even so. This is generally what is intended with this type of scenario, and is the reason the additive design was adopted.

The less common and problematic overlapping role security scenario

The less common scenario, in which the problem referenced above can surface, would be that of two roles used to secure attributes in two different dimensions, which might both apply to some users simultaneously. In the sample provided with this post, there are only two dimensions – one for Sales Reps who enter sales into a database, and the other for Takers, who take inventory from shelves to fulfill orders. The Sales Rep role secures the [Sales Rep] dimension, such that each representative will only have data to see his own sales:

Likewise, the Taker role secures the [Taker] dimension, such that each Taker will only see values reflecting inventory taken by him:

It so happens that some Takers may also be Sales Reps, and so in those cases, a user will be a member of both the Taker and Sales Rep roles simultaneously. Since the Taker role only secures the [Taker] dimension, this means the entire [Sales Rep] dimension has open permissions within that role:

And conversely, the Sales Rep role leaves the entire [Taker] dimension unsecured, though it secures the [Sales Rep] dimension. Because of the additive design of role security in Analysis Services, administrators may be dismayed to find that any user in both roles will have complete access to both dimensions in this scenario, including data neither taken from shelves nor sold by the user in question.

The user will effectively have no security applied from either role then. Fortunately, there are two ways to work around this issue.

Simplest workaround

The simplest workaround is to combine the two roles into a single role applicable to all members of each former role, enforcing the overlapping security all in one place. This will work for most cases, but there may be some cases where for some reason or other, the administrator will need or prefer to keep the roles separate. In the original real world case necessitating the alternate workaround – there was just such a reason.

Alternate workaround

In those cases where the simplest workaround to combine the overlapping security roles will not suffice, an alternate workaround exists – not quite so straightforward as combining the roles, but still very simple to implement. The workaround does require a custom assembly, and so can only be used on traditional multidimensional databases. Tabular databases do not support the use of custom assemblies.

To avoid the additive consequence when applying security, each role must detect whether the current user is present in the other affected role(s), and selectively deny access then to the entire affected dimension(s) from the other role when the current user is found in both. This is done by adding attribute security on the same attribute secured within the other role(s), to express a conditional Allowed Member Set:

When performing this check, if the user is not found in the other role (the other role being Sales Rep in the example above), then no security is applied to the other secured attribute. The Allowed Member Set returns all of the members in the attribute hierarchy with [Sales Rep].[Sales Rep Key].Members. The attribute remains unsecured then as it normally would be.

But if the current user is also found in the Sales Rep role when security for the first role is evaluated, instead of leaving the attribute unsecured, rather, all access is denied to those dimension(s) then within the first role. The returned Allowed Member Set is empty, denoted with the empty set expression {} above.

This change, when applied to both roles correspondingly, allows role security from each role to selectively grant access to its corresponding secured attribute(s), without inadvertently allowing access to everything in the other role(s)’ secured attributes, due to their being unsecured within it. In this way, each dimension can be secured according to the security expression assigned in its proper role, and the roles may be applied to multiple users without suffering a loss of security applied in any of the affected roles.

Custom assembly for alternate workaround

While the alternate workaround proves simple enough to achieve, it does require us to create a custom assembly. No function exists within the MDX language to directly retrieve the current user’s role memberships in the way the built-in UserName() function allows us to do for the current session’s login name.

Included in this post is a very simple custom assembly called RoleDetector, containing two functions:

using Microsoft.AnalysisServices.AdomdServer;

using System.Linq;

namespace RoleDetector

{

publicsealedclassRoleDetector

{

private RoleDetector()

{

}

publicstaticbool IsUserInRole(string RoleName)

{

AdomdCommand cmd = newAdomdCommand(

"SELECT ROLES FROM SYSTEMRESTRICTSCHEMA ($System.dbschema_catalogs, [CATALOG_NAME] = '"

+ Context.CurrentDatabaseName

+ "')");

if ((cmd.ExecuteScalar() asstring).ToLower().Split(',').Contains(RoleName.ToLower()))

returntrue;

else

returnfalse;

}

publicstaticstring Roles()

{

AdomdCommand cmd = newAdomdCommand(

"SELECT ROLES FROM SYSTEMRESTRICTSCHEMA ($System.dbschema_catalogs, [CATALOG_NAME] = '"

+ Context.CurrentDatabaseName

+ "')");

return cmd.ExecuteScalar() asstring;

}

IsUserInRole() accepts a single string parameter containing the name of the role to test, and returns true if the user is found to be in that role, false otherwise. Roles() simply returns a string containing a list of the roles applied to the current user context.

The very simple code uses the DBSCHEMA_CATALOGS Data Management View (DMV), executing a query against this to retrieve the active roles for the current user’s session, restricted for only the current database name using SYSTEMRESTRICTSCHEMA as described in the documentation on using the DMVs.

Once the assembly is built, it must be registered for the database, before an expression may reference it. We can do this in the Assemblies folder in Management Studio for the SSAS database, right clicking to choose New Assembly, and then browsing to the location of the .dll generated when we built the assembly in Visual Studio. The default most restrictive settings for the assembly are sufficient for it to provide the necessary role membership check, so nothing needs to be changed on the Register Database Assembly dialog then:

Once the assembly is registered, we can call RoleDetector.IsUserInRole(“Taker”) and RoleDetector.IsUserInRole(“Sales Rep”) from within our security expressions defined in each role (as shown above) to take action accordingly, to deny access to the alternate role’s secured attributes, whenever the user exists in both.

Confirming the result

The sample database and assembly attached at the end of this post demonstrate the issue. After restoring the .abf Analysis Services database backup (requiring SQL Server Analysis Services 2012, SP2) and registering the assembly, connect to the database as a non-administrative user to browse its measures, dragging measures and both dimensions from the browser window onto the query window. No data is visible, since the workaround is implemented in the sample database, and the current user is not listed in the corresponding secured dimension attributes (unless the current username happens to be johndoe, janedoe or alicealison, since those users do exist among the sample database’s secured attributes’ members, and so corresponding data will be visible in each dimension respectively for those users):

To verify the troublesome behavior if the workaround is not implemented properly, remove the Allowed Member Set expression on the [Taker Key] attribute for the [Taker] cube dimension on the Dimension Data tab for the Sales Rep role, and correspondingly, remove the Allowed Member Set expression on the [Sales Rep Key] attribute for the [Sales Rep] dimension on the Dimension Data tab for the Taker role, and execute the query again:

Sample database and assembly binary and code

The sample includes the .abf database backup for the sample database, as well as the assembly binary, RoleDetector.dll, and its underlying source code project for Visual Studio 2012, including references that must be updated, pointing to the Analysis Services ADOMD Server and AMO libraries, at the following respective default locations on a system with Analysis Services installed:

C:\Program Files\Microsoft Analysis Services\AS OLEDB\110\msmgdsrv.dll
C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies\Microsoft.AnalysisServices.DLL

AdditiveRoleSecurityWorkaroundSample.zip

References

Permissions and Access Rights in Analysis Services:
http://technet.microsoft.com/en-us/library/ms174786(v=sql.105).aspx
Creating a custom assembly for Analysis Services:
http://technet.microsoft.com/en-us/library/ms175340(v=sql.110).aspx
Data Management Views (DMVs) in Analysis Services:
http://msdn.microsoft.com/en-us/library/hh230820.aspx

Jon Burchel | Microsoft SQL Server Escalation Services