AlwaysON - HADRON Learning Series: - How does AlwaysON Process a Synchronous Commit Request

April 1, 2011, 1:28 pm

≫ Next: System Center Advisor: What is the email all about?…

≪ Previous: System Center Advisor: Anything interesting before I catch a plane…

The behavior of a synchronous commit in AlwaysON is to commit the transaction on the local replica as well as the secondary replica before the transaction is considered complete. If you have used database mirroring you may be familiar with the term ‘High Availability’ to describe a similar behavior.

A common misperception is that the commit request originates on the primary, SQL Server commits to stable media on primary and then sends a message to the secondary to commit the transaction wait for the response. However, in order to scale AlwaysON does NOT send a specific message to the secondary(s) for each commit request but instead uses multiple messages and active workers to optimize the performance. This blog will walk you through ~~how~~ a synchronous commit, showing you how the process works in AlwaysON.

Step	Action
Connect	The secondary establishes a valid connection to the primary using the configured mirroring endpoints.
Request Data	The secondary initiates a request to the primary, asking for the log blocks to be shipped. The secondary and primary will negotiate the proper LSN staring point and other information necessary.
Start Log Scanner	A log scanner worker is started on the primary. The log scanner ships log blocks to the secondary. The information can be retrieved from log cache, as blocks are flushed on the primary, or the log file, for LSNs required by the secondary, which are no longer resident in log cache.
Redo	The secondary starts a redo thread to processes log blocks (runs redo) that have been received from the log scanner on the primary and hardened on the secondary.
Progress Response(s)	The secondary sends progress message to the primary on regular intervals—approximately every 3 messages from the primary or every one second occur, whichever occurs first. The response contains information about the secondary progress including the LSN level that has been hardened as well as redone. NOTE: This is a significant part of understanding the commit behavior. There is a distinct difference between hardening the log block and redoing the log block. This will become more evident as I talk about how we wait for the commit response from the secondary.

At this point the primary and secondary are talking to each other, log blocks are being shipped to the secondary and hardened by the receive worker and the log blocks are being redone by the redo worker. As I mentioned before AlwaysON is using a series of messages and workers to accomplish activity in parallel.

Now let us look at what happens when you issue a COMMIT TRAN.

Step	Action
Commit Tran	The T-SQL COMMIT TRAN triggers the Flush To LSN behavior. SQL Server requires that the log records be flushed (hardened) to stable media to consider the transaction committed.
Local Flush To LSN	On the primary a request is made to Flush to LSN level of the commit record. This tells the log writer thread on the Primary database replica to bundle up all log records up to the commit and flush them to stable media. Here is where the messaging of AlwaysON shines. When a log block is flushed it will signal other readers of the log cache (in our case the log scanner) that a log block is ready and can be sent to the secondary. The log scanner picks up the log block(s) and sends them to the AlwaysON log block cracker. The cracker logic looks for operations that need special handling, such as file stream actives, file growth, etc… The cracking logic can send messages to the secondary and once the log block has been cracked the log block is sent to the secondary.
Log Block Message	The log block message is processed on the secondary. NOTE: This is where folks get a bit confused but hopefully I can explain. The processing of the log block message means the log block is saved to stable media (written to the log file). Redo is a separate worker! The log block can then be stored in the secondary log cache and redo will process the log block as it handles redo.
Progress Response	The progress response contains information such as LSN harden level and the redo LSN level. These can be different as redo is still in progress. These messages are returned approx. every 3 messages or 1 second (think of the database mirroring ping / is alive concept.)
Commit Complete	*KEY CONCEPT* To complete the commit, the log block must be harden on the primary and the secondary. It does not need to be redone on the secondary just hardened to meet HA capabilities. The redo lag time only impacts the time to return to operations and not the commit capabilities.

We can see that AlwaysOn uses a series of messages and workers is used to optimize the log block handling and provide high availability. By avoiding the transmission of a specific commit message to the secondary for every commit that occurs on the primary the system is able to avoid a flood of the communications between the primary and secondary(s) that would not scale well.

Bob Dorr - Principal SQL Server Escalation Engineer

↧

System Center Advisor: What is the email all about?…

April 6, 2011, 9:55 am

≫ Next: Why am I getting prompted for Credentials?

≪ Previous: AlwaysON - HADRON Learning Series: - How does AlwaysON Process a Synchronous Commit Request

The last time I left you with my adventures with System Center Advisor I was at the SeaTac airport checking out my alert dashboard before I jumped on to my plane back to Texas. And now that I’m back in the sunny skies of the Lone Star State (sorry Seattle I love travelling there but for a Texas boy there is nothing like the clear blue sunny skies in the spring), I decided to follow up on an email I had received for my live ID from System Center Advisor. The email looks like this:

I decided to focus on the fact that I have New Alerts so clicked on the “View all alerts” link at the right of the page. It prompted me to login to System Center Advisor. My alert dashboard showed me new alerts found on March 29th that perhaps I should look at:

I remember creating a new database called mavstothefinals (OK I can only dream as a Dallas Mavs fan this will happen). last week while in Seattle and I immediately created a backup for it (BTW. We have a rule that checks to see if you have ever backed up your database at least once so I didn’t hit that one). But I get these warnings that seem to indicate I need to pay attention to a few things about this database. Let’s look at each of these in more detail:

The first rule indicates I have not run a consistency check on this database. You will notice that in the right corner it shows the date and time I created the database so tells me how many days have occurred since I’ve run DBCC CHECKDB with no errors (this is what we mean by “clean consistency check”). I know there is debate in the community on how often you should run CHECKDB, but we took the conservative route with System Center Advisor. Our rule simply checks that you have run DBCC CHECKDB with no errors at least one time since you created the database. Once you run a CHECKB with no errors this alert should be closed (We will prove this tomorrow).

The next rule is pointing out that my backup I created is on the same volume as my database files:

This might seem like a very simple rule but you would be surprised how often I’ve seen this from customers contacting CSS. If you are not sure why this is important think about what this rule is saying. You have put your backup of your production database on the same disk drive as where the database files are stored. Should this drive fail, you will have nothing. No data or a backup. Or consider a scenario where the drive is accessible but has problems and both the backup and the database are corrupted because of it.. We implement this rule by running a T-SQL query that joins the backupset and backupmediafamily tables in msdb with sys.databases and sys.master_files. The comparison is to look for your latest full database backup and see whether the target drive letter matches a a drive letter for any files for the database. The Information detected information in the right corner shows you the drives we detected for your latest full backup and the drives we detected for database files should there be a match.

So far these rules make logical sense. It makes sense to run a consistency check when I first put a database in production and it definitely makes sense to put my backups on a different storage location than my production database files. The third alert is all about keeping the size of the transaction log reduced and represents a very common problem we have seen from customers. Have you ever seen the transaction log grow for your database it seems infinitely? There can be several reasons why but a simple explanation could be for this rule:

This rule indicates my new database is using the FULL recovery model (the default for SQL Server) but I have no transaction log backup. This is not a problem if you never create a full database backup first. But once you create a full database backup, the log will never be truncated until you backup the transaction log. This rule is implemented by checking to see what type of backup is your last backup. If the last type is not a transaction log backup and the database is using the full recovery model, we fire the rule. As with our other two rules, this rule is very practical and will only fire if you are not taking transaction log backups when using the FULL recovery model. If you see this rule and think it does not apply to you then it is very likely you should be using the SIMPLE recovery model. If you see your transaction log growing forever, this rule could easily explain why. There is on catch to this rule. If you backup your transaction log once and continue to make changes but never back it up again (and never take any other type of backup), the log will grow but this rule would not fire. We don’t check for this condition (today) because we were not sure what threshold to look for comparing the size of the log vs the age of the log backup. Rather our rule reflects a common issue where customers create a new database, create a full backup, but never backup the transaction log.

I think all of these rules make sense, so I’m going to follow through with:

Running CHECKDB against this database
Creating a full backup of this database on a different drive
Then creating a transaction log backup to avoid infinite log growth

In my next post, we will make sure these issues are resolved and then review the remaining alerts for my fresh installation of SQL Server 2008 R2.

Bob Ward, Microsoft

↧

Why am I getting prompted for Credentials?

April 18, 2011, 4:41 pm

≫ Next: RS Report Performance Relief in CU7

≪ Previous: System Center Advisor: What is the email all about?…

I just wrapped up a case for an issue I see every once and a while. The scenario is the following:

Browse to site
Get Prompted for credentials and enter username and password
Web site will come up normally

I’ve seen where some people go into Kerberos troubleshooting mode, but that in itself is not a Kerberos issue. The pattern for a normal Kerberos issue would be the following:

Browse to Site
Get Prompted for credentials and enter username and password
Repeat step 2 two more times
Get a 401.1 error from the web server

So, for this particular issue, the answer lies in the URL. More specifically in the Host of the URL.

If Internet Explorer detects periods within the Host Name, it will automatically force you into the Internet Zone.

If we were to just browse to the Netbios name as opposed to the Fully Qualified Domain Name (FQDN), we would see a different zone. Usually Local Intranet. The problem with the Internet Zone is that it will not automatically log you into the web site:

The fact that a Netbios will put you into the Intranet Zone allows for the automatic login to work. I thought we actually had this documented in the Reporting Services Books Online documentation, but looking for it, I was not able to find it.

Litmus Test

So, if you are getting prompted for Credentials, as yourself if you getting to the web site or not. If you do get through to the web site, chances are you may be hitting this issue, although that could happen for different reasons. The main thing is that it is probably not a Kerberos issue, which is how this issue was presented to me.

Now What?

So, we have identified that you have periods in your host name for the URL. How do we get rid of the prompts? You have a couple of options. Each with their pro’s and con’s.

Add the URL to the Intranet Zone to prevent it from being forced to the Internet Zone

The downside of this, is that you would either have to do this per machine, or push it out from a Policy perspective to your environment.

Use the Netbios name instead of the FQDN

This may not be doable for different reasons. Those may need to be discussed with your Network/Domain Team for your Company. In this customer’s case, ping wouldn’t even resolve the Netbios name. That would definitely need to be fixed if we had any hope of the URL working in Internet Explorer. I don’t know if their network allowed for WINS emulation through Active Directory.
You could always use your HOST or LMHOST file to locally allow a Netbios name to resolve. However, this would be machine specific and not really doable from a scale perspective.

In some cases, you may be using your IP address as the Host name. You would either need to do option 1, create a name instead of the IP in your HOST file, or work with your DNS team to get a host entry added there is nothing is available currently.
I was asked if Allowing Anonymous Users would be an option. Unfortunately, starting with 2008 and later, Anonymous isn’t supported with Reporting Services.

Authentication Types in Reporting Services

There are probably other options as well, but these are probably the most obvious from my perspective. Hopefully this will help some people who hit this issue.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

RS Report Performance Relief in CU7

April 19, 2011, 11:41 am

≫ Next: System Center Advisor: Finishing off my first list of advice….

≪ Previous: Why am I getting prompted for Credentials?

Cumulative Update 7 for SQL 2008 R2 was just released. Within that CU, there are two fixes for Reporting Services that people should be aware of that related to report performance.

Large HTML

This issue is related to reports that have a large amount of HTML elements. This could be due to reports that are not paginated, or they have very large page size. Another example would be reports using drill down functionality (toggle or expand/collapse functionality) where expanding out the items in the report results in a large amount of items being displayed.

The actual issue that this fix corrects deals with the “fix up” that is done to the HTML before it is displayed. The “fix up” effectively loops through all the HTML elements in the report.

FIX: Performance decreases after you move a large report to SQL Server 2008 R2 Reporting Services
http://support.microsoft.com/kb/2506799

Muti-Select Parameters

This issue is related to having a large number of values within a multi-select parameter list. Large would be something over a few hundred (>300). You could validate if you are hitting this by limiting the amount of items coming back in the query for the Parameter List to one hundred or less. If the issue clears up, you are probably hitting this issue.

FIX: Performance decreases after you move a report that contains a large multi-select drop-down parameter list to SQL Server 2008 R2 Reporting Services
http://support.microsoft.com/kb/2522708

The Toolbar

In both cases, the code that actually triggers this behavior is resident within the toolbar of the Report Viewer Control. You could try hiding the toolbar to see if that works around the issue to try and determine if you are hitting either one of these issues.

How to tell if you are hitting this issue?

One indicator that you may be hitting this issue is to compare the actual time it takes for the report to display vs. what the time looks like within the ExecutionLog3 view within the RS Catalog database.

select ItemPath, TimeStart, TimeEnd, TimeDataRetrieval, TimeProcessing, TimeRendering, Status
from ExecutionLog3

Based on these numbers, it looks like the report was really quick. Although at 11:54, when I’m writing this, it still hasn’t displayed in the browser. All I see is the spinny.

Based on some testing that we did on the CSS side with some repro’s, we found a dramatic decrease in time to render. In one case, it took 5 minutes to show in the browser, and after CU7 it took about 30 seconds. Although results will vary depending on your report and whether you were hitting this issue to begin with.

Thanks to Matt Hofacker on the RS CSS team for putting a lot of the data for this together!

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

System Center Advisor: Finishing off my first list of advice….

April 20, 2011, 9:32 am

≫ Next: Conversion issues moving from VS 2005 to VS 2008

≪ Previous: RS Report Performance Relief in CU7

When I last told you my story of installing System Center Advisor, I had seen some new alerts for a database I had created on this server called mavstothefinals (hey Lakers and Spurs lose in the first round and Mavs are up 2-0. It could happen…). I had received an email about new alerts for this database:

http://blogs.msdn.com/b/psssql/archive/2011/04/06/system-center-advisor-what-is-the-email-all-about.aspx

I resolved each of these issues and the next day these alerts were closed as I would expect (remember I told you that alerts are closed automatically when Advisor detects the problem is resolved).

What is left for me to resolve my initial list of advice from CSS?

I sorted these alerts based on “Path” which is the object that the alert applies to (SQL Instance, database, or the computer server). You can see the first 6 alerts are really the same 2 alerts that have fired for my system databases: model, master, and msdb. The first alert I’ve already talked about in the last post indicating I have not run a clean consistency check at least one time. I personally believe that for a new production installation this makes perfect sense. Why not see whether all of your production databases including system databases have had a clean consistency check before releasing this server into production? System databases are often overlooked for this but if they become damaged you may have problems operating or starting SQL Server. Remember that while the PAGE_VERIFY CHECKSUM is a great feature and will discover I/O problems with pages that were written to disk, what about problems with pages that occur while in memory? If a page becomes damaged in memory and is written to disk,, checksum will not catch this problem (there is a feature we have to do some checksum verification for pages in cache but not as robust was when reading from disk).

So let’s say we all agree that running a consistency check on these system databases makes sense at least one time to ensure they are clean. As I said earlier we are conservative about our rules to avoid making them too noisy for the entire world so we don’t advocate how often you should run DBCC CHECKDB on a regular basis. But run it at least once to ensure they are clean. So I’ve run the following commands on this SQL instance:

dbcc checkdb(master)
go
dbcc checkdb(model)
go
dbcc checkdb(msdb)
go

I received no errors so I expect by tomorrow for these rules to be closed:

What about the other alert on backup?

Even though this fires for msdb, master, and model I picked model to show you this rule. You may be reading this post and ask yourself why would we have a rule to check to see if you have a backup? What is behind this rule?

If you look at this description it describes what we check. We simply see whether you have EVER backed up the database (a FULL backup) since you created the database. For my user database from the last post, I had created an initial backup (but remember I didn’t have a log backup and I backed it up to the same volume as database). There are two main points I want you to take away about this rule:

1) You would be surprised how many people call CSS and don’t have a backup at all to restore from (no one ever calls me and says “Hey Bob I have a backup and just wanted to let you know I’m restoring it”). I must admit to you that when I first proposed this rule to the SCAdvisor team for the beta they said “Are you kidding. Who wouldn’t backup their database?” I pointed out to them that we have the years and years of customer experiences to prove it. They went along with my suggestion and in the early TAP program many customers hit this rule especially for system databases. One interesting story was that a TAP customer hit the rule and didn’t understand it because they were sure they were backing up their databases. Turns out their backup automation solution was failing and the software to notify them the backup automation was failing was also failing.

2) System databases including model should be backed up. Master may seem obvious to you but what about model? I’ve had a few folks ask me about backing up model and why we flag it. I turn around and ask back the question “What happens if model becomes damaged?” One possible problem is that SQL Server cannot be started if model cannot be opened because it is damaged. Now what do you do? Your production server is totally down because the engine cannot open model and you have no backup. The answer is you must rebuild all system databases to get model back into a state of health. (Note: We do store the initial model.mdf file in …\MSSQL\Binn\Templates so you may be able to copy from there but the documented supported method to recover model is restore from a backup).

So trusting this advice, I decided to create a backup of master, model, and msdb. Remember the rule I hit with my user database saying I shouldn’t backup to the same volume as the database files so I will make sure to backup these up to a different volume

This leaves me with 2 alerts:

The first alert has the following details:

I certainly plan to use tempdb on this server so would like to avoid hitting this problem so the update sounds like one I would like to apply. We in CSS picked this update because many customers have encountered the errors as described (605 and 824) and thought tempdb was damaged when this is actually not the case. As it turns out this is more of a query optimizer problem. I looked at the KB article that comes with this rule and it implies to me that SQL Server 2008 R2 RTM already has this fix. Then why did the alert fire? This is one of the nice features of Advisor: our ability to ensure a fix is “completely applied”. Look at the Information detected section in the right corner of these details:

It says a SQL Server update is not needed but says a trace flag update is needed. This is because for this fix to be applied you must enable a trace flag per the KB article (BTW. This is one of the reasons for this rule. Many, many people have missed that the trace flag is needed). The information detected tells me whether trace flag 4135 or 4199 is enabled.. Why 2 trace flags? This is where things get a bit complex. The original fix for this problem required trace flag 4135. However, we have also introduced a trace flag starting with certain builds, trace flag 4199, that enables several different “optimize fixes” including this one. Here is what the KB article for this fix says:

Note You can enable trace flag 4135 or trace flag 4199 to activate this fix. Trace flag 4135 was introduced in Cumulative Update package 3 for SQL Server 2008. Trace flag 4135 is also available in SQL Server 2008 Service Pack 1, in SQL Server 2008 Service Pack 2, and in SQL Server 2008 R2. Trace flag 4199 was introduced in Cumulative Update package 7 for SQL Server 2008, in Cumulative Update package 7 for SQL Server 2008 Service Pack 1, and in Cumulative Update package 1 for SQL Server 2008 R2. For more information about trace flag 4199, click the following article number to view the article in the Microsoft Knowledge Base:
974006 (http://support.microsoft.com/kb/974006/ ) Trace flag 4199 is added to control multiple query optimizer changes previously made under multiple trace flags

Since I have SQL Server 2008 R2 RTM.installed only trace flag 4135 is possible for me to use so that is what I’ll set to resolve this alert.

Let’s look at the last alert on my list which involves tempdb:

One of the most common issues we have seen over the years regarding the performance of applications using tempdb is allocation page contention (symptoms are high waits on PAGELATCH for PFS, GAM, and SGAM pages). And one of the most common reasons for this contention is the lack of multiple database files for tempdb. Spreading out the number of files for tempdb helps relieve bottlenecks for allocation pages in these files. Therefore, we built a rule in System Center Advisor to perform a very conservative check. If you have a SQL Server that is using more than one logical processor (we calculate this by looking for the number of schedulers in sys.dm_os_schedulers that are VISIBLE ONLINE) and you only have one tempdb file, we raise this alert. We don’t make any comments on how many files you should create (because this is a widely debated topic). We just know if you have one file and multiple schedulers you are likely to hit problems when using tempdb. We used this logic because that is a common customer call to CSS. A customer will call us with performance problems. We see it is high latch contention in tempdb involving allocation pages. And we find out there is only one tempdb file. The typical suggestion is to create one file for each logical processor but there is debate about what that limit should be.. So we don’t really go down that path. We just know if you only have 1 tempdb database file you are likely going to have problems. Why flag this for every customers? Because I have yet to see an application using SQL Server that does not use tempdb.

I hope the last few blog posts give you a feel for what System Center Advisor can help you stay ahead of problems managing your SQL Server. But to be honest the rules so far I have shown you are really some of the more basic checks. Of course some of the most basic checks can flag some of the most common mistakes. We have many other rules that can help you.

In my next series of posts on System Center Advisor, I will drill into all the other rules we have shipped so far for SQL Server (there are 50 total) and then talk more about Configuration Change History. To see a complete list of alerts, do the following:

From your Alert dashboard, select the Manage Alerts button:

From there pick the Available Alerts tab

All the alerts are listed here with a link to the KB articles for each rule.

Bob Ward, Microsoft

↧

Conversion issues moving from VS 2005 to VS 2008

April 21, 2011, 6:43 am

≫ Next: MS11-028 may cause issues with SQL, Exchange and PowerShell

≪ Previous: System Center Advisor: Finishing off my first list of advice….

I was asked to look at a case yesterday where the customer was hitting the following error after converting a VS 2005 Project to VS 2008.

Error 3 Custom tool error: Failed to generate code. Failed to generate code. Exception of type 'System.Data.Design.InternalException' was thrown. Exception of type 'System.Data.Design.InternalException' was thrown.

This was being produce by an XSD file which was a DataSet based on SQL CE. Luckily I could reproduce the issue locally with the customer’s project which made digging into this much easier.

Unfortunately, the error displayed above didn’t really give me a lot to go on, but it was an exception so I could get a memory dump. The thing I love about Managed (.NET) code is that customers can pretty much do the same thing and don’t necessarily need internal symbols. The Debugging Tools ship with an extension called SOS which you can load to look at things like the Managed Call Stack, and different objects. To get the dump, you can go about it different ways. One way would be to use a tool called DebugDiag (available from the Microsoft Download Center) and setup a rule for getting First Chance Exceptions on CLR exceptions. I actually just did a live debug within WinDBG as it was quicker for me to go through that. I had to sift through a bunch of exceptions until I arrived at the InternalException. Here was the call stack for the exception (which wasn’t presented in Visual Studio:

0:000> !clrstack
OS Thread Id: 0x32cc (0)
ESP EIP
002a949c 7614b727 [HelperMethodFrame: 002a949c]
002a9540 6a42ce4a System.Data.Design.ProviderManager.GetFactory(System.String)
002a9568 6a10c881 System.Data.Design.DataComponentMethodGenerator..ctor(System.Data.Design.TypedDataSourceCodeGenerator, System.Data.Design.DesignTable, Boolean)
002a9580 6a10c586 System.Data.Design.DataComponentGenerator.GenerateDataComponent(System.Data.Design.DesignTable, Boolean, Boolean)
002a95ec 6a116e0e System.Data.Design.TypedDataSourceCodeGenerator.GenerateDataSource(System.Data.Design.DesignDataSource, System.CodeDom.CodeCompileUnit, System.CodeDom.CodeNamespace, System.String, GenerateOption)
002a9650 6a140372 System.Data.Design.TypedDataSetGenerator.GenerateInternal(System.Data.Design.DesignDataSource, System.CodeDom.CodeCompileUnit, System.CodeDom.CodeNamespace, System.CodeDom.Compiler.CodeDomProvider, GenerateOption, System.String)
002a96a0 6a14061c System.Data.Design.TypedDataSetGenerator.Generate(System.String, System.CodeDom.CodeCompileUnit, System.CodeDom.CodeNamespace, System.CodeDom.Compiler.CodeDomProvider, GenerateOption, System.String)
002a96d8 6a1406bb System.Data.Design.TypedDataSetGenerator.Generate(System.String, System.CodeDom.CodeCompileUnit, System.CodeDom.CodeNamespace, System.CodeDom.Compiler.CodeDomProvider, System.Collections.Hashtable, GenerateOption, System.String)
002a9714 6a14071f System.Data.Design.TypedDataSetGenerator.Generate(System.String, System.CodeDom.CodeCompileUnit, System.CodeDom.CodeNamespace, System.CodeDom.Compiler.CodeDomProvider, System.Collections.Hashtable, GenerateOption)

That alone tells me a lot. It looks like we are trying to load a provider and some error is occurring. Our InternalException. InternalException still doesn’t tell me what the failure was, but the fact that we are trying to load a Provider narrows the window. Either we failed to find the provider, or it failed to load assuming we aren’t doing some processing within the GetFactory call itself.

Using .NET Reflector, you can have a look at System.Data.Design.ProviderManager.GetFactory to see what is inside it. What we are looking for are where we actually throw InternalException. I saw two places where that happens:

throw new InternalException( string.Format(System.Globalization.CultureInfo.CurrentCulture, "Cannot find provider factory for provider named {0}", invariantName) );

throw new InternalException( string.Format(System.Globalization.CultureInfo.CurrentCulture, "More that one data row for provider named {0}", invariantName) );

Based on this, it was either missing, or there were multiple entries for the provider. But, what provider? One thing I know based on the case itself is that it is SQL CE related, so probably a CE Provider. At that point, I typically look at the managed stack to see if I can figure out what that invariantName value is.

0:000> !dso
OS Thread Id: 0x32cc (0)
ESP/REG Object   Name
002a9430 1776c3bc System.Data.Design.InternalException
002a9440 07f6a388 System.Resources.ResourceReader
002a947c 1776c3bc System.Data.Design.InternalException
002a948c 081902d8 System.String    Microsoft.SqlServerCe.Client
002a9490 1776c3bc System.Data.Design.InternalException
002a9494 1776c3a8 System.Object[]    (System.Object[])
002a94bc 07a71c3c System.String    en-US
002a94c0 1776c3bc System.Data.Design.InternalException
002a94c8 1776c3bc System.Data.Design.InternalException
002a94cc 07a71ccc System.String    en-US
002a94d0 1776c3a8 System.Object[]    (System.Object[])
002a94d8 07a9aabc System.Globalization.CompareInfo
002a94dc 07f6a1e0 System.Resources.ResourceManager
002a94f4 1776c698 System.String    ERR_INTERNAL
002a94f8 1776c3a8 System.Object[]    (System.Object[])
002a94fc 1776c698 System.String    ERR_INTERNAL
002a951c 1776c3bc System.Data.Design.InternalException
002a9524 1776c3bc System.Data.Design.InternalException <- This is where the Exception was thrown
002a9540 07a71b18 System.Globalization.CultureInfo
002a9548 07a71b18 System.Globalization.CultureInfo
002a954c 1776b5fc System.Object[]    (System.Object[])
002a9550 081c6038 System.Data.Design.DesignTable

We can see the Exception, and one of the strings listed is “Microsoft.SqlServerCe.Client”, but it is listed after the Exception was thrown. I could go with that, but I do see an Object Array right before the Exception. Lets have a look at that to see what is in it. It may be related.

0:000> !DumpArray 1776b5fc <- could use !da also
Name: System.Object[]
MethodTable: 60ec42b8
EEClass: 60cada64
Size: 20(0x14) bytes
Array: Rank 1, Number of elements 1, Type CLASS
Element Methodtable: 60ef0704
[0] 081902d8

0:000> !do 081902d8
Name: System.String
MethodTable: 60ef0ae8
EEClass: 60cad65c
Size: 74(0x4a) bytes
(C:\Windows\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: Microsoft.SqlServerCe.Client

That looks pretty good to me. Lets go with “Microsoft.SqlServerCe.Client”. I started off by doing just a quick Bing search on it. And I got the following Forum post:

http://social.msdn.microsoft.com/Forums/en-US/vside2008/thread/4b3a423f-1bcb-4a3c-b582-fd3a1f3b3e66/

which lead me to the following forum post:

http://social.msdn.microsoft.com/forums/en-US/sqlce/thread/09ac8a71-a96d-4af2-8a19-43258b6ca7be/

Which based on that, led me to my Machine.Config. In the case of my machine, I do not have a provider listed for “Microsoft.SqlServerCe.Client” within my Machine.Config at C:\windows\Microsoft.NET\Framework\v2.0.50727\CONFIG. I have the following:

<system.data>

<DbProviderFactories>

<add name="Odbc Data Provider" invariant="System.Data.Odbc" description=".Net Framework Data Provider for Odbc" type="System.Data.Odbc.OdbcFactory, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" />

<add name="OleDb Data Provider" invariant="System.Data.OleDb" description=".Net Framework Data Provider for OleDb" type="System.Data.OleDb.OleDbFactory, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" />

<add name="OracleClient Data Provider" invariant="System.Data.OracleClient" description=".Net Framework Data Provider for Oracle" type="System.Data.OracleClient.OracleClientFactory, System.Data.OracleClient, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" />

<add name="SqlClient Data Provider" invariant="System.Data.SqlClient" description=".Net Framework Data Provider for SqlServer" type="System.Data.SqlClient.SqlClientFactory, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" />

<add name="Microsoft SQL Server Compact Data Provider" invariant="System.Data.SqlServerCe.3.5" description=".NET Framework Data Provider for Microsoft SQL Server Compact" type="System.Data.SqlServerCe.SqlCeProviderFactory, System.Data.SqlServerCe, Version=3.5.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91" />

<add name="SQL Server Compact Edition Data Provider" invariant="System.Data.SqlServerCe" description=".NET Framework Data Provider for Microsoft SQL Server Compact Edition" type="System.Data.SqlServerCe.SqlCeProviderFactory, System.Data.SqlServerCe, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91" />

</DbProviderFactories>

</system.data>

The entry that it is looking for is the following, based on the second forum post.

<add name="SQL Server CE Data Provider" invariant="Microsoft.SqlServerCe.Client" description=".NET Framework Data Provider for Microsoft SQL Server 2005 Mobile Edition" type="Microsoft.SqlServerCe.Client.SqlCeClientFactory, Microsoft.SqlServerCe.Client, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91" />

However, I don’t have that assembly on my machine. I believe this comes from the VS 2005 binaries and I don’t have VS 2005 installed on my machine. The reason the InternalException is being triggered is due to the following line within the XSD:

<Connection ConnectionStringObject="Data Source =".\mydatabase.sdf"" IsAppSettingsProperty="False" Modifier="Assembly" Name="MyConnectionString" ParameterPrefix="@" Provider="Microsoft.SqlServerCe.Client">

</Connection>

Before opening the 2005 Project in VS 2008, I modified that line within the XSD from “Microsoft.SqlServerCe.Client” to “System.Data.SqlServerCe”. After that, I opened the project in VS 2008 and did not receive the error. It appears to have used a valid provider at that point which allowed the conversion to proceed.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

MS11-028 may cause issues with SQL, Exchange and PowerShell

April 21, 2011, 1:21 pm

≫ Next: CSS SQL Azure Diagnostics tool released

≪ Previous: Conversion issues moving from VS 2005 to VS 2008

A .NET Security Update (MS11-028) may end up causing issues with applications that make use of the .NET Framework, mostly resulting in applications being unable to launch, or loss of functionality within Native Applications that make use of .NET. In some cases, it could lead to an application crash. For this issue to occur, the updates have to be laid down in a specific order.

For this issue to occur, KB 979744 would have had to be installed prior to 2449742 or 2446709 (part of security bulletin MS11-028). NOTE: 979744 is a prerequisite for Exchange 2010 SP1.

You may experience the following:

· Reporting Services service fails to start.

· SQL Server Profiler tool cannot be launched on the server machine.

· SQL Server Management Studio cannot be launched on the server machine.

· Database Mail and CLR functions may be broken.

· Event Viewer cannot be launched on the server machine.

· PowerShell cannot be started.

This issue only affects the following Operating Systems:

Vista SP2
Windows 2008 SP2
Win 7 RTM
Windows 2008 R2

Win7 SP1 and Windows 2008 R2 SP1 are NOT affected!

We have since reissued KB 979744 to avoid the problem. The Exchange team has posted a blog about this as well here.

KB 2540222 describes how you can detect if you have the original version of KB 979744 installed and offers steps to correct the issue.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

CSS SQL Azure Diagnostics tool released

April 25, 2011, 3:26 pm

≫ Next: AlwaysON - HADRON Leanring Series: Maximum Failovers Within Specified Period

≪ Previous: MS11-028 may cause issues with SQL, Exchange and PowerShell

I am happy to announce that CSS SQL Azure Diagnostics (CSAD) has been released. Since you cannot use PSSDiag/SQLDiag against SQL Azure, I decided to develop this tool to shorten the data collection process when troubleshooting SQL Azure issues. You can point CSAD to your SQL Azure instance, provide the appropriate credentials and will then be presented with some good summary data about your instance. Since I leverage the standard ReportViewer control, you can also export the reports to a number of different formats. This makes it easy to share the reports with either your colleagues or CSS. In addition, CSAD is a Click-Once application, so it has a very light installation and it always checks for the latest version. (For some more details on the installation, see the very end of this post).

You can download it from http://csssqlazure.blob.core.windows.net/csssqlazuredeploy/publish.htm or click on the link above.

Let’s walk through using CSAD:

1) Enter your server and user information

2) Click “GO”

That’s it!

Now for the more interesting part of this post and walk through the results you get back…

The first thing you will see is a general information section:

Although there are just a couple of things in this section right now, it is a key area. Here is where you can see your database size, plus CSAD runs some tests to see if you are running into any known service issues that have not yet been addressed. As CSAD continues to develop, it will add more information here like SKU, version, etc.

Next you will see the first of the core tables – Top 10 CPU consumers:

This shows your queries that are consuming the most CPU, plus some pertinent information about these queries. You can use this table to figure out which queries likely need some tuning.

Next, you will see your longest running queries:

If you continue down through the pages, you will then see your top logical and physical I/O consuming queries:

These last two tables should give you a pretty good idea on which queries are missing an index or have an incorrect index. (NOTE: One of the next features I am adding is the ability to identify the missing index and generate the appropriate TSQL to create the index).

Lastly, I want to point out that you have the ability to either print or export this report:

The beauty of CSS SQL Azure Diagnostics is that it doesn’t use any inside information. None – everything that is pulled is pulled from public DMVs. In fact, and you can test this by unchecking “SQL Azure database” at the top of the page, you can run the same exact queries against an on-premises instance of SQL Server and get the same exact data back. This is going to be one of the tenets of CSAD going forward – it will always only use queries and information that anybody can use against any SQL Server instance in the world – be it on-premises or in the cloud. (NOTE: Although the DMVs used are public, I don’t yet have them documented in the tool itself. I promise to do that in a near-term release, though. In addition, when I document the DMV queries, I will add a lot more information on the different columns in each table to help you interpret them).

INSTALLATION DETAILS

1) CSAD does require the installation of the ReportViewer 2010 and the .NET 4.0 Client Profile. It should check for both components on install, but you can also install them separately:

Download details- Microsoft Report Viewer 2010 Redistributable Package

Download details- Microsoft .NET Framework 4 Client Profile

2) No reboot is necessary

3) Each time CSAD starts up, it checks the Azure blob storage location for a newer version and updates itself if necessary

4) You can uninstall it by going to Control Panel –> Add/Remove Programs

5) I have already seen a few isolated instances where the ReportViewer control wouldn’t install. If you run into that scenario, just install it separately using the link above

P.S. Thanks to Chris Skorlinski for providing me with the original DMV queries.

↧

AlwaysON - HADRON Leanring Series: Maximum Failovers Within Specified Period

April 26, 2011, 7:37 am

≫ Next: Why does this query consumes so much CPU?

≪ Previous: CSS SQL Azure Diagnostics tool released

I can't take the credit for all this content as much of the investigation was done by Curt Mathews (SQL Server Escalation Engineer).

We are finding that folks want to test the failover abilities of AlwaysON but after a single failover it no longer seems to work. This is because of the default, cluster policy of "Maximum specified failovers in the specified period." This is a lengthy way of saying; avoid a ping-pong effect of the availability group.

For a new availability group the default is 1 failover within a 6 hour period. This may be exactly what you want to avoid a ping-pong effect of failovers in a production environment but it can be disconcerting when you break the seal on AlwaysON and failover and then try a second failover and it does not work. You may need to adjust the values to better suit your testing or production requirements.

Using the cluster.exe log /g you can review the failover activity. Shown below is an example of the cluster, log message indicating that the failover threshold was exceeded and failover won't be attempted at this time.

The following is a mock-up of what the Windows Event Log may provide in future operating system releases.

Bob Dorr - Principal SQL Server Escalation Engineer

↧

Why does this query consumes so much CPU?

April 29, 2011, 3:35 pm

≫ Next: NOT (LinkedServertoSQLAzure) where Provider=’MSDASQL’ and DSN=’SQL Server Native Client’

≪ Previous: AlwaysON - HADRON Leanring Series: Maximum Failovers Within Specified Period

Recently I worked with a customer who reported a slow running query. Let me simplify this to illustrate the problem.

There are two tables t1 (pk int identity primary key, c1 int, c2 int, c3 int, c4 varchar(50)) and t2 (pk int identity primary key, c1 int, c2 int, c3 int, c4 varchar(50)). Each table has about 10000 rows.

But the following query is very slow.

select COUNT (*) from t1 inner join t2 on t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3 and t1.c4<>t2.c4

This query runs over 30 seconds and consumes over 30 seconds of CPU. The query actually returns 0 rows. With two tables of size of 10,000 each, this seems to be unreasonable.

When investigate CPU consumption by a query, we normally look at a few things. First, we look at how many logical reads this query has done. Secondly, we look at the plan to see how many rows are processed by each operator.

But when we track logical reads via profiler trace (reads column), we see very low logical reads (less than 60). When we look at the plan, the number of rows processes are not that many either. The partial execution plan is shown below.

For while, we couldn’t figure out what was going on. Finally upon examining data distribution, we realized the root cause.

Using a query similar to the following, we found out that majority of the rows from each table (t1 and t2) are the same values.

select c1, c2,c3, COUNT (*) from t1
group by c1,c2,c3
having COUNT (*) > 1

t1:

t2:

If it were not the last joining condition (t1.c4<>t2.c4), the query would have produced over 100,000,000 matches because each table has 10000 rows of duplicates.

The plan produced is hash match. The hash key is on columns c1,c2 and c3 of t2. But one hash bucket ends up having most of the values. During probing phase of hash match, for every row retrieved, we need to compare against all the 10,000 duplicates. That’s where the CPU is spent. This is like processing a cartesian product because of so many duplicates. It will require process power to do the work.

For this particular query with count aggregate, optimizer can do some trick some times. for example, if one of the tables have very high density, it may choose to do aggregate on that table (called partial aggregate) first and then join another table to get final count. This can help sometimes. But doing partial aggregate is not always considered as best option. In this example, even though majority of the data is duplicate, there are many other distinct values. This makes optimizer believe it’s more costly because it will general many small groups as result of partial aggregate. Therefore, it didn’t generate a plan with partial aggregate.

here is a complete demo

use tempdb
go
drop table t1
go
drop table t2

go
create table t1 (pk int identity primary key, c1 int, c2 int, c3 int, c4 varchar(50))
go
create table t2 (pk int identity primary key, c1 int, c2 int, c3 int, c4 varchar(50))
go

begin tran
set nocount on

declare @i int
set @i = 0
while @i< 10000
begin
    if @i < 1000
    begin
        insert into t1 (c1, c2,c3,c4) select @i,@i,@i, ''
        insert into t2 (c1, c2,c3,c4) select @i,@i,@i, ''
    end
    insert into t1   (c1, c2,c3,c4) select 100000,20000,30000, ''
    insert into t2   (c1, c2,c3,c4) select 100000,20000,30000, ''
    set @i = @i + 1
end
commit tran

create index ix on t2 (c1,c2,c3,c4)

go
set statistics profile on
set statistics io on
set statistics time on
go
--this query will consume over 30 seconds of CPU
select COUNT (*) from t1 inner join t2 on t1.c1=t2.c1 and t1.c2=t2.c2 and t1.c3=t2.c3 and t1.c4<>t2.c4
go
set statistics profile off
set statistics io off
set statistics time off

↧

NOT (LinkedServertoSQLAzure) where Provider=’MSDASQL’ and DSN=’SQL Server Native Client’

May 3, 2011, 1:13 pm

≫ Next: Secondary Access to File Stream(s) File, In Open Transaction, May Hang

≪ Previous: Why does this query consumes so much CPU?

I recently had two different people suggest the same (unfortunately unsupported) solution to the fact that there is no supported way of doing a linked server to SQL Azure directly from inside of SQL Server. Since the reason the proposed solution is unsupported are subtle in the sense that it is deep down in documentation few people read, I wanted to publicize this limitation a bit more.

The suggestion was to configure the linked server to use the Microsoft OLE DB Provider for ODBC Drivers (MSDASQL) and then use SQL Server Native Client’s ODBC functionality. Unfortunately, while this does indeed circumvent the fact that SQL Azure doesn’t support OLE DB, it runs afoul of the fact that using SQL Server Native Client underneath MSDASQL is not supported. This is documented at http://msdn.microsoft.com/en-us/library/ms131035.aspx.

So, in summary, a good idea but unfortunately it is not a supported solution.

↧

Secondary Access to File Stream(s) File, In Open Transaction, May Hang

May 4, 2011, 7:37 am

≫ Next: Why the Query Designer Table Listing could be empty

≪ Previous: NOT (LinkedServertoSQLAzure) where Provider=’MSDASQL’ and DSN=’SQL Server Native Client’

This post is directly from an issue I have been working on. The behavior is very difficult to simulate because of the flags and timings involved. In this post I will attempt to describe the scenario and provide you with a simple workaround for your applications.

Reproduction Steps

1. Begin a SQL Server transaction

2. Obtain the file stream context and logical path

3. Open the file stream file for Write but allow caching (do not enable the WriteThrough flag) Ex: using (var LobjSqlStream = new SqlFileStream(LstrStringPath, LobjBuffer, FileAccess.Write))

4. Write data to the file. This is an important step because a dirty file is required to trigger the behavior. Data must be held in the NTFS file system cache.

5. Close the file but leave the SQL Server transaction active. This is another important facet as you can only trigger the behavior for files that were opened cached, dirtied and are still part of an active SQL Server transaction.

Now a secondary access to the file is required to trigger the problem.

6. While the SQL Server transaction is still active a second request to open the file must be made. Not only is the open required but it must specify the file should be opened WITHOUT file system caching. Ex: using (var LobjSqlStream = new SqlFileStream(LstrStringPath, LobjBuffer, FileAccess.Write, FileOptions.WriteThrough))

Workaround

When opening the file stream file for writing be sure to always specify the WriteThrough flag to avoid file system cache and the associated behaviors described in this post.

using (var LobjSqlStream = new SqlFileStream(LstrStringPath, LobjBuffer, FileAccess.Write, FileOptions.WriteThrough))

More Information

I indicated that this is a difficult issue to simulate because it took me 2 days using the kernel debugger and breakpoints to trigger the behavior and develop a reliable and repeatable, reproduction of the problem. What is happening is a maintenance operation by NTFS during the second open request.

The first open and close sequence results in cached, write data in the file system cache. When you are allowing NTFS to cache the data the Close operation from the client does not force the data out of NTFS cache. The cache uses its own algorithms for maintenance. However, the second open request indicates it wants a NON-CACHED access path. The CreateFile call to NTFS does a lookup in the cache and finds data that needs to be flushed before this NON-CACHED access path can be honored. When NTFS finds this situation a close it triggered on the original stream data ($DATA stream of the file) and the Rsfx filter is invoked. The file stream filter driver (Rsfx) sees that the SQL Server transaction is still active and attempts to flush the data to honor the transactional requirements. In doing so it causes a deadlock on the thread with itself as NTFS has already determined it needs to flush the data as well.

This results in the CreateFile call from the client application hanging, waiting for the System.exe thread to complete the create but the system thread has deadlocked itself.

If the SQL Server transaction would have been committed or rolled back the RsFx driver would have made sure the cached data was flushed to disk as part of the transactional logic. Why didn’t I suggest a workaround of making sure to commit the transaction before a second access? As always keeping the transaction open for a short period helps concurrency so this is a good practice. However, I have been able to trigger the behavior with virus scanners, log backup and a simple test application that just opens the file at the right time. Keeping the time between the close of the file and the commit of the transaction short is important but it is not a completely safe workaround. The safe workaround is opening the file with WriteThrough to avoid NTFS cache in the first place.

You can restart the client application that is stuck in the CreateFile call to ‘un-hang’ you client. Most of the time it will also require a reboot of the server to clear the system thread deadlock as well.

Fix: A bug has been filed with the SQL Server team and is being evaluated for inclusion in an upcoming CU release.

Bob Dorr - Principal SQL Server Escalation Engineer

↧

Why the Query Designer Table Listing could be empty

May 12, 2011, 2:16 pm

≫ Next: New extended event to track writes to the snapshot sparse file

≪ Previous: Secondary Access to File Stream(s) File, In Open Transaction, May Hang

I just worked a case that took a little bit to figure out and involved some pretty deep debugging, but I figured I would get something out to explain the behavior to hopefully prevent someone else from having to go through the debugging that I just went through.

To give some background, when designing a report for Reporting Services in BIDS (Visual Studio), you may get to a point where you want to add a DataSet. Within the DataSet Properties dialog box, there is an option for the query to use a query designer.

This query designer is actually the Query Designer that ships with Visual Studio. It is not actually a part of the Reporting Services Code base. This nice thing about this is, that if you encounter an issue with the Query Designer when in a Report Project, you can probably reproduce the issue outside of the Report Designer. The Server Explorer Connections would help you expose this Query Designer Dialog when you try to make a query off of that connection.

When in the Query Designer, you can right click, in the upper area, and select “Add Table…”. When you do that, you should see a listing of Tables.

However, for this case, the list was empty and we were using a 3rd party ODBC Driver.

When I first saw this, my thought was that the metadata call that we were making wasn’t returning the proper values. From an ODBC standpoint, this call is SQLTables. Luckily, with ODBC, we have an ODBC Trace that we can see what is coming back.

Looking at the ODBC Trace, I could actually see data coming back. There were 5 items that came back. Here is a sample of one (sanitized of course):

devenv          117c-1da0    EXIT SQLFetch with return code 0 (SQL_SUCCESS)
        HSTMT               0C5249E0

devenv          117c-1da0    EXIT SQLGetData with return code 0 (SQL_SUCCESS) <-- TABLE_CAT
        HSTMT               0C5249E0
        UWORD                        1
        SWORD                       -8 <SQL_C_WCHAR>
        PTR                0x09403008 <— NULL value
        SQLLEN                  4094
        SQLLEN *            0x0012A7A4 (-1)

devenv          117c-1da0    EXIT SQLGetData with return code 0 (SQL_SUCCESS) <-- TABLE_SCHEM
        HSTMT               0C5249E0
        UWORD                        2
        SWORD                       -8 <SQL_C_WCHAR>
        PTR                 0x09403008 [       6] "dbo"
        SQLLEN                  4094
        SQLLEN *            0x0012A7A4 (6)

devenv          117c-1da0    EXIT SQLGetData with return code 0 (SQL_SUCCESS) <-- TABLE_NAME
        HSTMT               0C5249E0
        UWORD                        3
        SWORD                       -8 <SQL_C_WCHAR>
        PTR                 0x09403008 [       8] "Test"
        SQLLEN                  4094
        SQLLEN *            0x0012A7A4 (8)

devenv          117c-1da0    EXIT SQLGetData with return code 0 (SQL_SUCCESS) <-- TABLE_TYPE
        HSTMT               0C5249E0
        UWORD                        4
        SWORD                       -8 <SQL_C_WCHAR>
        PTR                 0x09403008 [      10] "TABLE"
        SQLLEN                  4094
        SQLLEN *            0x0012A7A4 (10)

devenv          117c-1da0    EXIT SQLGetData with return code 0 (SQL_SUCCESS) <-- REMARKS
        HSTMT               0C5249E0
        UWORD                        5
        SWORD                       -8 <SQL_C_WCHAR>
        PTR                 0x09403008
        SQLLEN                  4094
        SQLLEN *            0x0012A7A4 (-1)

From the above, we can see that the TABLE_CAT value was empty (null). From the documentation linked above though, this shouldn’t be a problem.

TABLE_CAT

Catalog name; NULL if not applicable to the data source. If a driver supports catalogs for some tables but not for others, such as when the driver retrieves data from different DBMSs, it returns an empty string ("") for those tables that do not have catalogs.

The values coming back from the SQLTables call should be specific to the given Catalog that we passed in. When passing in a Catalog, that is expected to be a search pattern for the results that come back. We do pass in the Catalog value based on the catalog of the connection when we make the call.

The problem comes in to play when the Visual Studio Query Designer actually does another select off of the values that were returned. That select, from a .NET perspective, is again matching the whatever values that came back with the Catalog that we have for the connection. Because the TABLE_CAT values were empty, the Query Designer assumes it is not for this catalog, and will not add them to the list box. This appears to be a redundant filter, especially being that the Catalog passed into the SQLTables call should have already done that.

There is no workaround to this behavior and it is present in both Visual Studio 2008 and 2010. It is also not something I would classify as a Bug. As it is explicit behavior based on how it was coded.

If you do run into a case where you do not get tables populated in the Query Designer list and you are using an ODBC Driver, hopefully you can use the above to at least validate if you are hitting the same behavior and can explain why it is happening.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

New extended event to track writes to the snapshot sparse file

May 13, 2011, 12:01 pm

≫ Next: SharePoint Adventures : ADFS Setup – Part 1

≪ Previous: Why the Query Designer Table Listing could be empty

You might have seen our earlier blogs around different snapshot related errors and the techniques available to address them. Among all those, one piece that was missing is the ability to track the specific write activity that goes against the snapshot files. In the past we had to resort to predictive analysis using information from DMV’s like sys.dm_os_buffer_descriptors.

Starting with SQL Server 2008 SP2, a new Extended Event was added to accurately trace these write activities that go against the snapshot sparse files. The name of the event is file_written_to_replica. You can expect this new event to surface in SQL Server 2008 R2 in a future service pack. When you query the sys.dm_xe_object_columns, you can find out the different data columns that are available for this event.

Most of the information here is self explanatory. There are 2 custom data columns: path and io_data. The path refers to the full physical file path of the snapshot file. The io_data is the actual data that gets transferred. By default these two data columns are not captured. You have to enable them using the SET command in the create event session statement.

Here is the actual event data from a collection I did recently:

Using this event, now you will be able to find out the activity from the application that corresponds to the most usage of space in the snapshot databases as well as information like snapshot file fragmentation. Be careful while configuring the extended event session and add appropriate filters to reduce the amount of information that gets collected on busy systems for this event.

Here are some of our previous blogs on this topic:

http://blogs.msdn.com/psssql/archive/2008/02/07/how-it-works-sql-server-2005-database-snapshots-replica.aspx
http://blogs.msdn.com/psssql/archive/2008/07/10/sql-server-reports-operating-system-error-1450-or-1452-or-665-retries.aspx
http://blogs.msdn.com/psssql/archive/2009/01/20/how-it-works-sql-server-sparse-files-dbcc-and-snapshot-databases-revisited.aspx
http://blogs.msdn.com/psssql/archive/2009/03/04/sparse-file-errors-1450-or-665-due-to-file-fragmentation-fixes-and-workarounds.aspx

Thanks

Suresh Kandoth

↧

SharePoint Adventures : ADFS Setup – Part 1

May 16, 2011, 6:24 am

≫ Next: SharePoint Adventures : ADFS Setup – Part 2

≪ Previous: New extended event to track writes to the snapshot sparse file

I've had a few cases that involved customers using ADFS, SharePoint and Reporting Services. When going through this, one of the big struggles was just getting ADFS up and running with SharePoint 2010 so we could focus on the customer's issue. I did find a few items out there that got me pointed in the right direction, but they were all missing a few pieces that connected the dots. I thought I would share out my experience and try to be as complete as possible. I'll also say, that I do not consider myself an expert in ADFS, but I did have to run though this multiple times.

One thing this blog post will not go into is normal Active Directory (AD) setup or setting up a Certificate Server. I will be using the Certificate Server that comes with Windows Service for use with this setup. This is not a requirement, but made it easy for me to get this working in a complete manner as opposed to using self generated certificates, or getting trial certs for a demo purpose.

Certificates

During this course of this setup, there will be multiple points where we will need a certificate for one reason or another. Either ADFS or SharePoint/IIS. Because I have a Certificate Server setup, it is pretty easy to get a new cert. Which is also why I'm using it. You can also create a Self generated cert.

Within IIS Manager, highlight the server itself. There you will see an item labeled "Server Certificates". Make sure you are not on the Site, but the actual Server.

From there you have a few options. You can "Create Domain Certificate…" which will base it off of the Certificate Server bound to your Domain. You can also "Create Certificate Request…", which will go through the normal request for a Certificate Authority (CA) not bound to Active Directory. This could be a 3rd party CA like Verisign, or a CA that you have in your environment that isn't bound to your AD environment.

The last option is "Create Self-Signed Certificate…". This allows IIS to create a temporary certificate for use. While this can be used for your situation, it complicates things a little when it comes to moving the certificate around. When we get to the SharePoint integration, there will be times where we need to export and import some certificates. This is much easier to do when we have a known CA.

ADFS 2.0 Setup

When I first started on this, I recall seeing ADFS within the Roles for Windows Server, and thought: "Sweet, this will just be a quick wizard and I will be on my way!" Yeah, I was wrong.

What is listed in the Role selection is considered ADFS 1.0. What we really want is ADFS 2.0. This can be downloaded from the Microsoft Download center here. And here is a quick link to the Documentation on ADFS 2.0. This will grab AdfsSetup.exe. Before I ran this, I made sure that I had Certificate Server setup as it will be needed.

From the install perspective, I choose "Federation server". From a quick test environment, this works fine, but if you are deploying this into a production environment you may need to have a different configuration depending on what you are doing.

Of note, I installed this on my Domain Controller, as I had a limited number of machines. This requires that PowerShell, IIS and some .NET items be installed for this to work as the ADFS items are basically a .NET Web Service. It will also install Windows Identify Foundation (WIF). I've talked about the Claims to Windows Token Service (C2WTS) before, and that is also part of WIF.

That's pretty much it from the install wizard that I encountered. After setup is done, I went into the ADFS Snap-in. From there it has a link to start the "ADFS 2.0 Federation Server Configuration Wizard".

ADFS 2.0 Federation Server Configuration Wizard

I then choose "Create a new Federation Service". Hit Next.

Because this is a test server, I choose "Stand-Alone federation server". Of note, this will also make use of SQL Server Express Edition for the ADFS data store. Which goes on the DC. Definitely not recommended on a Domain Controller if you are deploying this to production. It does mention that you can set this up with a full blown SQL Server from a command line, but I think SQL Express will still be laid down initially and you can just switch it over to full blown SQL Server. I'm not sure if there is a workaround to prevent SQL Express from being laid down. It may be that once you switch it over to the full blown SQL Server install, that you can just uninstall SQL Express.

It may also be that the "New federation server farm" option gives you some options for this, but I didn't have a look at that.

The next piece is where the Certificate Server first comes into play.

The certificates listed here are what are in the local Certificate Store. As this is the cert that will be bound to the Web site, the name will be derived from the Certificate itself. Meaning that the URL for the site should match the Certificate name. If the name listed isn't one you want, you can go to IIS and create a new Cert. Refer back to the Certificates section at the beginning of this blog which talks about how to create a new certificate.

That's pretty much it for that Wizard.

Claims/SAML

One thing to keep in mind is that ADFS will generate a SAML token which is what we use for Claims. When you setup a trust relationship (which we will get into in the next post), you are really setting up what claims you want in your token. As a result, when you setup a Trust Provider, which is what ADFS will be, within SharePoint 2010, it will be Claims based authentication.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

SharePoint Adventures : ADFS Setup – Part 2

May 17, 2011, 8:57 am

≫ Next: Slow performance and out of memory issues caused by large batch

≪ Previous: SharePoint Adventures : ADFS Setup – Part 1

In Part 1, we looked at getting ADFS actually installed, for Part 2, we will see what we need to do to get it working with SharePoint 2010. Being that I focus on Reporting Services, the end goal here is to see how it work with Reporting Services. However, RS will be for another post.

To get this working with SharePoint, there are a few things that we need to do.

Token-Signing Certificate

The ADFS 2.0 Service has what is called a token signing certificate.

Step 1 - Install to local Trusted Root

The first thing we want to do with this cert is to install it into the trusted root for the computer. This cert happens to be auto generated. You could also change the cert to be generated from the Certificate Server, but I'm not going to do that for this example. To install it into the trusted root, we can right click and view the certificate.

Click on "Install Certificate…". Go through the wizard and choose to "Place all certificates in the following store". We then want to be sure we select "Show physical stores" and select "Trusted Root Certification Authorities\Local Computer". This will ensure that the certificate goes into the Local Computer Trusted Root as opposed to the User's Trusted Root which would be problematic.

If we look at the Trusted Root for the local computer, we should see the certificate now.

Step 2 - Export the Token Signing Cert

We want to export the Certificate so that we can import it into SharePoint and the Trusted Root of the SharePoint machine. I just put these certs into a Certificate folder on the C drive. To Export, just right click on the cert, go to Tasks and choose Export.

We won't be able to export the private key, so we can just go with DER Encoded binary X.509.

Step 3 - Grab the Web Cert for ADFS as well

While we are here, lets grab the ADFS Web Certificate as well, as we will need that. This certificate happens to be in the Personal store.

We can just do the same thing that we did with the Token Signing Certificate.

Step 4 - Install the Certs into the SharePoint Box Trusted Root

We will want to install both Certificates into the Trusted Root for the SharePoint box. Within MMC, add the Certificates Snap-In for the local computer. Go to "Trusted Root Certification Authorities" and then right click on Certificates. Go to "All Tasks" and then "Import…".

After you have imported both Certificates, you should see them listed.

SharePoint Trusted Provider

Now that the certs are in place, we can setup the SharePoint Trusted Provider. This is done through PowerShell. Here is the script:

$certPath = “C:\Certificates\TokenSigningCert.cer”
$cert = New-Object System.Security.Cryptography.X509Certificates.X509Certificate2(“$certPath”)
$map1 = New-SPClaimTypeMapping “http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress” -IncomingClaimTypeDisplayName “EmailAddress” -SameAsIncoming
$realm = “urn:” + $env:ComputerName + “:adfs”
$signinurl = “https://dsdcontosodc.dsdcontoso.local/adfs/ls/”
$ap = New-SPTrustedIdentityTokenIssuer -Name “ADFS20Server” -Description “ADFS 2.0 Federated Server” -Realm $realm -ImportTrustCertificate $cert -ClaimsMappings $map1 -SignInUrl $signinurl -IdentifierClaim $map1.InputClaimType

Lets break this down.

$certPath = “C:\Certificates\TokenSigningCert.cer”
$cert = New-Object System.Security.Cryptography.X509Certificates.X509Certificate2(“$certPath”)

This gets the cert that we exported, and sticks it into the $cert variable.

$map1 = New-SPClaimTypeMapping “http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress” -IncomingClaimTypeDisplayName “EmailAddress” -SameAsIncoming

This sets up our Claim mapping that we will use to identify users. Basically off of the Email address. We will have to set this up on the ADFS side as well a little later.

$realm = “urn:” + $env:ComputerName + “:adfs”
$signinurl = “https://dsdcontosodc.dsdcontoso.local/adfs/ls/”

The first item here identifies the Realm that we will use for the ADFS communication. This is also defined on the ADFS side as well. It is important that you keep these the same on the SharePoint side and the ADFS side. It doesn't have to be call this, but it needs to match what you will later define on the ADFS side. The second item is the URL to ADFS itself. In my case, it is on my domain controller, but this may be different in your case. It is important that the URL you use matches the Certificate in place with regards to whether you use the machine name or FQDN. They need to match.

$ap = New-SPTrustedIdentityTokenIssuer -Name “ADFS20Server” -Description “ADFS 2.0 Federated Server” -Realm $realm -ImportTrustCertificate $cert -ClaimsMappings $map1 -SignInUrl $signinurl -IdentifierClaim $map1.InputClaimType

The last command is what pulls all of this together and creates the Trusted Provider within SharePoint. To run the above PowerShell script, make sure you use the "SharePoint 2010 Management Shell" and that you run it as Administrator.

After that is done, we then want to add the Token Signing Cert to SharePoint's Trusted Root with the following command which will use the same $cert that we defined.

New-SPTrustedRootAuthority “Contoso ADFS Token Signing Trusted Root Authority” -Certificate $cert

ADFS Web Cert

I found, when going through this, that any certificate that SharePoint interacts with needs to reside within SharePoint's trusted root. Even if the cert is in the computer's trusted root, it also needs to be in SharePoint's trusted root. This was the cause of a big headache for me. So, when dealing with Certs, I made sure if I added it to the Machine's local Trusted Root that I also added it to SharePoint's Trusted Root.

$certPath = “C:\Certificates\ADFSWebCert.cer”
$cert = New-Object System.Security.Cryptography.X509Certificates.X509Certificate2(“$certPath”)
New-SPTrustedRootAuthority “DSDContosoDC web server” -Certificate $cert

SharePoint should be good to go from an ADFS perspective at this point. We just need to finish up the ADFS portion.

ADFS Relying Party Trust

The Relying Party Trust is the ADFS setup to know that SharePoint will be coming into it. This allows the ADFS provider to trust the SharePoint requests coming in.

We can just right click on "Relying Party Trusts" within the ADFS 2.0 window and select "Add Relying Party Trust...". This will spawn yet another wizard. Within that wizard we want to select "Enter data about the relying party manually".

The Display name can be whatever you choose. I choose "SharePoint ADFS Provider". We then want to select "AD FS 2.0 profile" on the "Choose Profile" landing page. For the "Configure Certificate" landing page, we can skip that.

For the URL, we want to select "Enable support for the WS-Federation Passive protocol" and enter the SharePoint Trust URL. In my case, this is https://dsdcontososp:5556/_trust/. Two things to note. First, the HTTPS on the url. HTTPS is required for this to work. Also the ending / after trust. Don't forget the trailing slash. The Port 5556 is what I will set my SharePoint Site to be.

For the "Configure Identifiers" landing page, you will notes that the URL we entered for the URL landing page is listed here. We want to remove this and add the Identifier that we used in our PowerShell Script. It is important that these match! Outside of that, you can make it the URL if you want, they just have to be the same on this screen and in the PowerShell Script. NOTE: The identifier is case sensitive.

For the "Choose Issuance Authorization Rules" landing page, you can select "Permit all users to access this relying party". Then finish out the wizard. At then end, be sure to leave the Edit option selected as we do want to edit the trust.

Edit Claim Rules

On the Edit Claim rules window, we want to Add a Rule. For the Rule, we can choose "Send LDAP Attributes as Claims". Then hit Next.

We will setup the attributes as follows:

This will send across the Email Address, their Account Name and the groups that they belong to in Active Directory. Once that is done, we can hit Finish and then OK. Note: The accounts in AD need to have an email address defined for this to work properly.

Setting up the SharePoint Site

I already have a Windows Claims site setup on Port 5555. To keep things a little separate, I will Extend this site to create a port that will be dedicated for ADFS.

This site will go on Port 5556 too match our Relying Party Trust in ADFS. It will make use of SSL, of which I already have a Certificate setup.

For the Authentication for this site, I will not choose Windows Authentication, but will select our newly created ADFS20Server provider that was made with the PowerShell script.

The URL for this site is not the FQDN in my case. The URL will really be based on what the Certificate you have, as they need to match. In my case, I created a cert just off of the machine name and not the FQDN.

Once all of that is done, click on OK and let SharePoint do it's thing. Once that is done there are a few other loose ends that I had to shore up.

First was disabling Anonymous access to the 5556 site within IIS. Not sure why it is enabled, but I turned it off. Another is making sure a user has access to the site. My Windows credential does not work and give me access when I come in on the ADFS side. I think this is due to the fact that my identify is different at that point and I'm being represented by my email address and not my SAM name. I didn't try switching it around to prove that point, but to get around it I did the following.

Within Central Admin, go to Application Management -> Site Collections -> Change Site Collection administrators

From here we can see that the ADFS20Server provider is available to search from. If I had tried this from my Windows Claims site, that provider wouldn't have been visible, which is why I can't just add users from that site.

Be sure to use the full email address and not just the account name. Once all of that is done, the site should come up with your ADFS Login displayed.

Hopefully this helps with setting up and configuring ADFS. This is not an exhaustive look at all of the features and capabilities, but should be helpful for at least getting your hands on the technology and playing around with it. This is what I had to go through to get a local repro up and running.

Adam W. Saxton | Microsoft SQL Server Escalation Services
http://twitter.com/awsaxton

↧

Slow performance and out of memory issues caused by large batch

May 18, 2011, 1:14 pm

≫ Next: Pssdiag/Sqldiag Configuration Manager released to codeplex

≪ Previous: SharePoint Adventures : ADFS Setup – Part 2

Recently, I worked with a customer who reported out of memory errors from his SQL Server 2008 R2 with 32GB of RAM. One of the errors is 701 error (Error: 701, Severity: 17, State: 123. There is insufficient system memory in resource pool 'default' to run this query.)

After taking a memory dump, we discovered that the particular query that consumed large amount of memory was a query like below. I rewrote is so that it doesn’t reference customer’s table names.

declare @tempIDs TABLE (id int primary key);
insert into @tempIDs values (0);
insert into @tempIDs values (1);
insert into @tempIDs values (2);
insert into @tempIDs values (3);
insert into @tempIDs values (4);
insert into @tempIDs values (5);
insert into @tempIDs values (6);
....
.... 1.5 million inserts like this
select * from sys.objects where object_id in (select id from @tempIDs )

From application side, the following C# code will generate the query above

static void InsertMethod()
{
StringBuilder builder = new StringBuilder();
builder.Append("declare @tempIDs TABLE (id int primary key);");

for (int i = 0; i< 1000000;i++)
{
builder.Append(string.Format("insert into @tempIDs values ({0}); ", i));
}

builder.Append("select * from sys.objects where object_id in (select id from @tempIDs )");
SqlConnection conn = new SqlConnection(ConnectionString);
conn.Open();
SqlCommand cmd = conn.CreateCommand();
cmd.CommandText = builder.ToString();
cmd.ExecuteReader();

}

This query would do 1 million inserts into a table variable and then does a join with other tables. As you can imagine, the query batch size is quite large. This kind of batch will cause two issues. The performance will be slow. In addition, it will consumes large amount of memory. This because SQL will need to parse each statement. It not only takes time but also consumes memory to store internal structure. In addition, using table variable this way with large number of rows is inappropriate. See this post for details.

So what’s the solution?

There are various solutions depending on your situation. For example, you can use SSIS or bulk insert to get the data into a permanent table and then join with other tables. But do NOT use IN or OR clause. If you use IN or OR clause, you will have 1 million values in IN and OR cause. That type of approach will cause slower performance and memory error as well with large number of values like this.

If you have to generate the values dynamically and join with other tables, use SqlBulkCopy object to pump data into a temp table and then do joins. The following C# code example will achieve that. It will be fast and memory requirement will be smaller.

static void BulkCopy()
        {
            SqlConnection conn = new SqlConnection(ConnectionString);
            conn.Open();
            SqlCommand cmd = conn.CreateCommand();
            cmd.CommandText = "            create table #tempIDs (id int primary key);";
            cmd.ExecuteNonQuery();

SqlBulkCopy bulkcopy = new SqlBulkCopy(conn);

bulkcopy.DestinationTableName = "#tempIDs";
SqlDataAdapter schemaAdapter = new SqlDataAdapter("select id from #tempIDs", conn);

            DataTable dt = new DataTable();
            schemaAdapter.Fill(dt);
            for (int i = 0; i < 1000000; i++)
            {
                DataRow row = dt.NewRow();

row["id"] = i;

                dt.Rows.Add(row);
            }
            bulkcopy.WriteToServer(dt);

            cmd.CommandText = "select * from sys.objects   where object_id in (select id from             #tempIDs )";
            SqlDataReader reader = cmd.ExecuteReader();
            while (reader.Read())
            {
                Console.WriteLine(reader[0]);
            }

conn.Close();

}

Jack Li | Senior Escalation Engineer | Microsoft SQL Server Support

↧

Pssdiag/Sqldiag Configuration Manager released to codeplex

May 24, 2011, 8:41 am

≫ Next: Dipping My Toes Into SQL Azure

≪ Previous: Slow performance and out of memory issues caused by large batch

A lot of you are probably familiar with pssdiag tool released for SQL Server 7.0 and 2000 and sqldiag.exe utility shipped with SQL Server 2005, 2008 and 2008 R2. These are data collection diagnostics tools that allow collecting data such as profiler trace, perfmon and DMV information.

For sqldiag.exe shipped with SQL Server 2005 and beyond, there isn’t a configuration tool publically available that allows customization. Microsoft product support has maintained a separate configuration tool that would allow support engineers to customize and collect data for troubleshooting SQL Server issues.

Due to customers’ demand, we have just released this configuration tool under codeplex. The URL is at http://diagmanager.codeplex.com/.

You can use this configuration manager to customize what data you need to collect. Additionally, you can use Pssdiag/Sqldiag Configuration Manager to collect data for SQL Nexus which is a data analysis tool used by Microsoft product support.

The project has source code that allows you to further enhance the tool. But it also includes download of compiled binary. You don’t have to manually build the project in order to use it. Just download the setup program and run it on your machine.

Learning curve should be very low. If all you need is to collect data for SQL Nexus, you just need to choose versions and platforms. The default data collections (perfmon counters, trace events and DMV data collection) are selected for you. The project page has detailed steps on configuring and running for data collection.

Please use discussion and issue tracker tabs to ask questions or report issues. Below is a screenshot of the main page.

Jack Li | Senior Escalation Engineer|Microsoft SQL Server Support

↧

Dipping My Toes Into SQL Azure

May 27, 2011, 8:19 am

≫ Next: JDBC 1.x: “She sure was a good ship”…

≪ Previous: Pssdiag/Sqldiag Configuration Manager released to codeplex

My high school English teacher once told me that I should always have a spell checker on my computer. This might help explain why I never make it past the first couple of chapters in any technical publication before I have to 'try-it-out.'

Over the last month I have spent several of my evenings learning what SQL Azure really is and does. I have read quite a few articles and publications but as usual I have a need to 'try-it-out!'

Keith and I have worked on the RML Utility set for more than a decade. As it is a very established codebase I decided to 'port' it to use the SQL Azure as the database storage for the ReadTrace analysis. I now have the test suite running against SQL Azure so let me share some of my learning's with you.

Easy
I was expecting a bunch of hurdles from such an establish code base but overall the experience was REALLY easy. There are some changes I had to make to the applications but they turned out to be minor. With all the documentation I read I was envisioning a significant amount of work but the SQL Server development team has done a nice job at making the vast majority of capabilities just work. The RML applications don't use full text and some of the other facilities that SQL Azure has not provided yet which helped my effort.

Firewall Client
The first thing I had to get used to was the use of a Firewall client. Because the transitional, on premises application, does not cross security boundaries I generally don't have to pay much attention to the firewall client. When using SQL Azure I had to exit the Microsoft CorpNet, and traverse across the internet to connect to my SQL Azure database(s). The SQL Azure database(s) are contained in an entirely different domain and data centers from my normal corporate network. It would take an act of congress, if you will, for me to get direct access to the systems. These are highly secured environments, protecting the SQL Azure user's data.

I installed the Forefront TMG Client, enabled the automatic detection and I was able to connect to my SQL Azure database(s).

OStress
The first thing I did was fire up OStress to see if I could connect to SQL Azure and run commands. WOW - Worked the first time! OStress is an ODBC based application such as OSQL.exe and I didn't have to make any changes. The parts of the RML test suite that use OStress worked as-is to create and drop the database.

ReadTrace
The next thing I did was try ReadTrace as-is and it indeed failed. ReadTrace is an OLEDB based application, SQLOLEDB was not the problem it was some of the T-SQL that ReadTrace was using.

Create Database
One of the first things ReadTrace does is check to see if the analysis database exists. If not it issues a CREATE DATABASE if it does it drops the objects and recreates them. Create Database must be the first and only command in a batch for SQL Azure.   ReadTrace was using a series of commands in the batch.   All I had to do was break the Create Database into a separate batch.

Use Database
In order to insert data into the database or drop and create the objects ReadTrace would change database context using the Use Database statement.   Use database is not supported in SQL Azure so I changed the connections to specify the default database instead.

Clustered Indexes
ReadTrace builds the objects and uses the SQLOLEDB Bulk Insert interfaces to load data into the tables.   The Bulk Insert is supported by SQL Azure but the table has to have a clustered index before any data can be inserted. Each SQL Azure database is replicated to 2 additional locations so in the event of a failure a failover can occur.   The data replication component of SQL Azure uses the clustered index to assist in the replication activity so it must be present before an data can be inserted into the table. ReadTrace did not add indexes to the database until after the data was inserted.    I moved the creation of indexes from after the load to before the load to resolve this issue.

Select Context_Info
When connecting to SQL Azure the Context Information for the session is automatically set to a GUID, tracking activity id, for the life of the connection.   This can be used by Microsoft support if you encounter problems.   I added logic when ReadTrace connected to a SQL Azure database, to log the activity id. select CONVERT(NVARCHAR(36), CONTEXT_INFO())

Select Info
There were a couple of places in ReadTrace and Reporter that used select into #TMP that failed. I didn't expect this on a temporary table. I can understand if it was a select into a table in my database that this is a problem because the clustered index is not present and it would cause replication problems.   However, the information in the TEMPDB is not replicated among the SQL Azure replica's so I was not expecting this failure.    I altered the logic to create table #TMP and insert into instead of select into.   I also filed a work item with the development team to relax this restriction for temporary tables.

Idle Connections
ReadTrace establishes up to 4 utility connections. These are used to create the objects in the database and do maintenance activities during the load.    I found that from time to time these connections were dropped when doing a large test run.   After some investigation I found that if the connection is idle for some period of time, various firewalls and gateway settings, from my ISV and others impact the connection lifetime.    I had to add logic to detect when the connection was 'dead' and re-connect because during the large test runs the Bulk Insert connections were active but my utility connections remained idle.

trace events
ReadTrace and Reporter use the trace event DVMs to provide event names and only store event ids.   Since tracing is not currently supported against SQL Azure the DMVs are not present.   I loaded a table with the event details so I had the same information in the analysis database.

Reporter
Reporter is a Managed SQL Client application and it ran against the SQL Azure databases without changes except for 2 locations using select into.

Connectivity
Because I am no longer using a local environment I am not going to hit the same router or two and connect to my SQL Server. The data is still safe because of the encrypted communications it just might traverse to locations you might not expect. The vast majority of the time I don't have any connection problems to my SQL Azure database(s). However, once in a while I would look at my route and it would be going all over the internet before it got to the SQL Azure servers. This lead to a few connection time-outs. To correct this I updated the connection logic in the utilities to retry the connection, if not a security failure, and it resolved the intermittent connection problems. I could have extended the connection timeout as well.

BCP
The test suite for RML uses BCP.exe to export the loaded data and do a diff comparison to the expected results. The test suite was using the QUERYOUT option but remember SQL Azure does not allow a use database or cross database query. I had to update the BCP.exe to SQL Server 2008 R2 based BCP.exe with the new -d<<databasename>> command line parameter. This allows the database to be used as the default during connection establishment and lets BCP run correctly. As I mentioned, every so often a connection attempt would fail. BCP.exe does not support a connection timeout parameter so I added BCP.exe retries to the batch file that drives the copy out activity as well.

Recap
I have a private tree for RML that works against SQL Azure and it was pretty easy to get working. Our utilities use ODBC, OLEDB and the Managed SQL Client API sets and they all worked seamlessly as well. Out test suites already supported both Windows and SQL Authentication so I didn't have to make any change for SQL Azure, just run the suites with the SQL Authentication needed to connect to my SQL Azure databases.

Now that I have the test suite running against SQL Azure my next goal is to look at performance tuning. The test suite performance is quite good right now but I would like to perform a more in-depth study. Do I need or should I shard the database and if so what does that look like, etc…?

I hope your first experience with SQL Azure is as easy as mine!

Bob Dorr - Principal SQL Server Escalation Engineer

↧

JDBC 1.x: “She sure was a good ship”…

June 5, 2011, 5:05 pm

≫ Next: Dipping My Toes Into SQL Azure – Part 2 – Protection Mechanisms

≪ Previous: Dipping My Toes Into SQL Azure

OK, OK – the title is a complete rip-off of Bob Ward’s post about SQL Server 7.0, but the sentiment is the same. On June 25th, support for the the Microsoft JDBC driver versions 1.0, 1.1, and 1.2 will end. Much like SQL Server 7.0, the v1.x versions of the driver were a major milestone for Microsoft’s support of JDBC.

The driver we supported prior to v1.0 was the SQL Server 2000 JDBC driver. You may or may not know, but this driver was really a DataDirect driver that we licensed for redistribution. We were responsible for supporting problems with it, but ultimately, we had to engage DataDirect if there was a bug or we needed code review. As you can imagine, while DataDirect was very responsive and we had a great working relationship, it wasn’t necessarily the best long-term solution for our customers since we ultimately had to rely on a third-party for the deep code work.

Therefore, we decided to release a version of the driver that we owned entirely and could completely support on our own. While that meant we had to develop a lot more expertise with Java and JDBC, we felt it was the right decision for our customers. This became the SQL Server 2005 JDBC v1.0 driver and was our first real foray into JDBC. We later released v1.1, but the next real big jump was v1.2 where we completely rewrote a huge portion of the data flow between the driver and the server (I’ll bet David Olix still has nightmares about it!). v1.2 was also the first version where we were officially certified for use with a version of WebSphere. This relationship has continued, but I remember the long arduous process to get that first certification completed.

So, in summary, while the v1.x drivers were great for their time, their time has passed and if you haven’t already done so, you need to look at upgrading to at least v2.0 (personally, I would recommend you go to v3.0 to give your application the longest possible supported lifetime) so that you stay in a supported configuration.