Quantcast
Channel: CSS SQL Server Engineers

In-Memory OLTP files –what are they and how can I relocate them?

$
0
0

In SQL 2014 and above, you can create memory optimized tables with In-Memory OLTP feature.   When you use this feature, SQL Server actually generates native code to optimize performance.  As a result, there will be dll, pdb files plus other intermediate files.  In fact, native optimization is one of three pillars of high performance.  The other two are no lock/no latch implantation and optimizing for in memory (no buffer pool handling).

Each stored procedure or table will have separate set of files generated.   These are managed by SQL Server and you don’t need to worry about them normally.  But we actually got a report from customer lately and they got the following error when starting their database

"Msg 41322, Level 16, State 13, Line 0
MAT/PIT export/import encountered a failure for memory optimized table or natively compiled stored procedure with object ID 214291823 in database ID 6. The error code was 0x80030070".

The error 0x80030070 is operating system error for ERROR_DISK_FULL “There is not enough space on the disk”.

It turned out that customer has lots of memory optimized objects (tables and stored procedures) and that resulted in lots of files generated.

Where do these files get stored?

They are stored in the default location of database file for the server instance. 

SQL will always create a subfolder like <default data file location>\xtp\<dbid> and then store files.  The file names follow the convention of xtp_<p or t>_<dbid>_<objected>.*.     For example, when I created a sample In-Memory OLTP with just one memory optimized table named t, my instance of SQL Server generated the following files.  

image

 

if you query sys.dm_os_loaded_modules, you will see the native dlls loaded. see a screenshot below.

image

Additionally, these files will be always deleted and recreated for the following conditions

  1. SQL Server restarts
  2. Offline/online database
  3. drop and recreate a table or procedure

 

How can I relocate these files?

If you want these files stored in a different location, all you need to do is to change default data file location.  SQL Server Management Studio allows you to do that.   But you will need to restart SQL Server after the change.  Once you do that, the In-Memory OLTP related files will be in the new location.

image

 

Jack Li |Senior Escalation Engineer | Microsoft SQL Server

twitter| pssdiag |Sql Nexus


Are My Statistics Correct?

$
0
0

The question is often “Are my statistics up-to-date?” which can be a bit misleading.   I can make sure I have up-to-date statistics but the statistics may not be accurate. 

I recently engaged in an issue where the statistics were rebuilt nightly.   A maintenance job change had been made moving from FULLSCAN to WITH SAMPLE statistics creation/update that dramatically altered the statistical layout.  The underlying data was skewed and as such the execution plan generation(s) varied significantly.  Queries running in 1 minute now took over an hour to complete using an alternate plan with significant memory grants and TEMPDB usage.

As you can imagine this issue has resulted in a series of DCR asks from the product team. 

The dilemma we all run into is what level of SAMPLED statistics is appropriate?   The answer is you have to test but that is not always feasible and in the case of Microsoft CSS we generally don’t have histogram, historical states to revisit.  

Microsoft CSS is engaged to help track down the source of a poorly performing query.   It is common step to locate possible cardinality mismatches and study them closer.   Studying the statistics dates, row modification counter(s), atypical parameters usage and the like are among the fundamental troubleshooting steps.

The script, below, is one way Microsoft CSS may use to help determine the accuracy of the current statistics.   You can use similar techniques to check the accuracy of your statistical, SAMPLING choices or to store historical information.  The example loads a specific histogram for the ‘SupportCases’ table then executes queries to, using the key values and range information to determine actual counts (as if FULLSCAN) had been executed.   The final select of the captured data can be used to detect variations in current actual vs the in use histogram.

create table #tblHistogram
(
vData sql_variant,
range_rows bigint,
eq_rows bigint,
distinct_range_rows bigint,
avg_range_rows bigint,
actual_eq_rows bigint DEFAULT(NULL),
actual_range_rows bigint DEFAULT(NULL)
)
go

create

procedure #spHistogram @strTable sysname, @strIndex sysname
as

dbcc show_statistics(@strTable, @strIndex) with HISTOGRAM
go

truncate

table #tblHistogram
go

insert

into #tblHistogram (vData, range_rows, eq_rows, distinct_range_rows, avg_range_rows)
  exec #spHistogram ‘SupportCases’, ‘cix_SupportCases’
go

– EQ_ROWS

update #tblHistogram
set actual_eq_rows = (select count(*) from SupportCases with(NOLOCK) where ServiceRequestNumber = h.vData)
  from #tblHistogram h;

– RANGE_ROWS

with BOUNDS (LowerBound, UpperBound)
as
(
  select LAG(vData) over(order by vData) as [LowerBound], vData [UpperBound] from #tblHistogram
)

update

#tblHistogram
  set actual_range_rows = ActualRangeRows
  from (select LowerBound, UpperBound,
   (select count(*) from SupportCases with(NOLOCK) where ServiceRequestNumber > LowerBound and    ServiceRequestNumber < UpperBound) as ActualRangeRows from BOUNDS
) as t
where vData = t.UpperBound
go

select

/*TOP 10 NEWID(),*/ vData, eq_rows, actual_eq_rows, range_rows, actual_range_rows from   #tblHistogram
  where eq_rows <> actual_eq_rows or range_rows <> actual_range_rows
–order by 1
go

Testing the script I leveraged UPDATE STATISTICS with SAMPLE 1 PERCENT and skewed data in my table.   This resulted in several steps of the histogram having a statistical variation of +200% from the actual (FULLSCAN) values.

I continued to test variants of SAMPLE PERCENTAGE until the statistical relevance level from actuals fell within a noise range.   For my data this was 65 PECENT.   SAMPLING at 65 PERCENT allows reduction of statistics creation/modification time while retaining the necessary statistical relevance.

Bob Dorr – Principal SQL Server Escalation Engineer

The given network name is unusable because there was a failure trying to determine if the network name is valid for use by the clustered SQL instance

$
0
0

Have you seen this message before? We see our customers encounter this message while performing SQL Server installation. If there is a problem, you will normally get this message in the “Instance Configuration” page of the “new SQL Server Failover Cluster setup” sequence.

Here is how the screen appears with the message at the bottom:

image

After you provide the SQL Server Network Name and instance name, you will click Next. At this point the setup program performs a few validations. If those validations fail, then you will notice the error message at the bottom of the screen. If you click on the error message, you will see the some additional information embedded in this message at the end which is not visible by default in this view. Here is an example:

image

In general you might encounter one of the following messages:

The given network name is unusable because there was a failure trying to determine if the network name is valid for use by the clustered SQL instance due to the following error: ‘The network address is invalid.’

The given network name is unusable because there was a failure trying to determine if the network name is valid for use by the clustered SQL instance due to the following error: ‘Access is denied.

The troubleshooting steps and resolution for these situations depends on the what the last part of the error message indicates. Let’s take a quick look at how the setup program performs the validation of the network name. The setup program calls the Windows API NetServerGetInfo and passes two parameters: The network name that you typed in the setup screen and level 101. There are multiple outcomes from this Windows API call:

1. The API call returns OS error code 53 [The network path was not found]. This tells the setup program that network name provided in the setup program is good to use since nobody is currently using that same name in the network. This is what you ideally want to happen. Setup can proceed to the next steps.

2. The API call returns success. This tells the setup program that there is another computer active with this same name and hence we cannot use the network name provided in the setup screen. This is essentially a duplicate name scenario. This is straight forward and you can provide a different name to be used by setup.

3. The API call returns other unexpected failure states like the following:

RPC error code 1707 which translates to "The network address is invalid"
Security error code 5 which translates to "Access is denied"

    These are the same error messages you actually get on the setup screen in the last part of that long error message. Now, let us review the steps you can take to troubleshoot these errors and resolve them.

As a first step, you can isolate this issue to this specific API call and remove SQL server setup from the picture. You can take the sample code for Windows API NetServerGetInfo to build a console application and pass the same network name as parameter to this call. Observe which one of the error codes discussed above is returned back. You need to get back OS error 53 but you might be getting 1707 or 5 as error codes.

If you now use Process Monitor to track the activity, you will notice a CreateFile call to \\SQL-VNN-TEST\PIPE\srvsvc encounter a BAD NETWORK NAME or ACCESS DENIED.

If you do not have the required permissions to create computer objects, make sure that the computer objects are pre-staged with the appropriate permissions as described in the document: Failover Cluster Step-by-Step Guide: Configuring Accounts in Active Directory. Also validate that there is no stale entry in the DNS server that is pointing this network name to a different IP address. If possible, clean up all entries related to this network name from the active directory and other name resolution servers like DNS. It will be a good idea to create entries for this network name fresh as described in the section “Steps for prestaging the cluster name account” and “Steps for prestaging an account for a clustered service or application”.

In the past when our networking team debugged this, they noticed that the error code changes (from 53 to 1707) while the network request is flowing through the various drivers in the network stack. RDBSS will show the correct error code but when the request reaches MUP it gets changed to one of the incorrect error codes we finally encounter. Typically this happens when there is some filter driver sitting in the network stack and intercepting these calls and eventually changing the return codes. So next step for you will be to review all processes and services that are running on this system and evaluate if you can disable or remove the non-critical ones just during the installation or troubleshooting timeframe.

Check if this problem happens only for a specific name or any network name that you pass for the validation. This can help establish the fact that there is a generic network issue at play than looking up a specific network name.

It will be great to hear from you if you encountered this issue and which one of the above steps you used to resolve this issue. Also if there is something we have overlooked, please let us know so we can add them to this list of steps to resolve this issue.

Thanks,

Suresh Kandoth – SQL Server Escalation Services

Will SQL Server use ‘incomplete’ or ‘dirty’ statistics during online index rebuild?

$
0
0

We had a customer who opened an issue with us and wanted to know the behavior of statistics during online index rebuild.  Specifically, he suspected that SQL Server might have used ‘incomplete’ statistics because his application uses read uncommitted isolation level.

This type of questions comes up frequently.  I thought I’d share my research and answers to this customer so that readers will benefit from this blog.

In order to answer the question more accurately, let’s be specific.     Let’s call Stats1 for index1’s statistics before online index rebuild and stats2 is after online index rebuild.   Furthermore, let’s call Stats3 for any incomplete stats during the index rebuild.   Now the question becomes:  during online index index rebuild  for index1 (started but not completed), which stats will my query (compiled during online index rebuild) use (stats1, stats2 or stats3)?

Here are few key points that answer the above question:

  1. First of all, there is no stats3.  SQL Server never stuffs in flight stats to stats blob for use during online index rebuild.  Even you are under dirty read, you won’t get non-existing stats3.
  2. During online index rebuild, stats1 (old stats) continues to be available for use until the very end
  3. Stats2 (new stats) will be updated at very end of index rebuild .
  4. During the brief period when SQL switches to new stats (stats2), no one can access stats at all.  Even with read uncommitted isolation level, you can’t access it.    This is because SQL Server acquires schema modification lock at the very last of online index rebuild to make changes in meta data including stats change.   Even you have read uncommitted isolation level, you still need schema stability lock for the table.  You can’t have that when schema modification lock is granted by someone else.  In short, you will never see anything in between.  You either see before (stats 1) or after (stats2).
  5. After online index rebuild, all queries involved in the tables will need to recompile .

What about Index reorg?

REORG does nothing related to statistics update. In other word, REORG doesn’t update stats for the index at all.    I have posted a blog.   In the interest of finding impact of reorg on locks and recompile, I did more research.  Re-org won’t cause recompile of your query or hold schema modification lock.  It requests a schema stability lock which is much ligher weight.  Reorg does acquires and releases x locks on pages or rows.  But these have no effect on stats or queries in read uncommitted isolation levels.  In otherwords, your query in read uncommitted isolation will continue to run without any impact.  Re-org only help on data is accessed physically.   No stats update, no recompile. 

What is the duration of schema stability locks

for online index rebuild, duration of schema-modification lock (for rebuild, sql acquire schema modification lock) is very brief towards the end.  All it does is to do metadata update?

Jack Li |Senior Escalation Engineer | Microsoft SQL Server

twitter| pssdiag |Sql Nexus

MultiSubnet = TRUE Is Now Default Behavior

SQL Server 2016 Temporal Data Assists Machine Learning Models

$
0
0

Microsoft is always seeking out ways to improve the customer experience and satisfaction.  A project that is currently active looks at the SQL Server incidents reported to Microsoft SQL Server Support and applies Machine Learning.   A specific aspect of the project is to predict when a case needs advanced assistance (escalation, onsite, development or upper level management assistance.)

Not every model requires historical data but working with our data scientists I realized the importance of temporal as it related to our approach.  We are trying to predict on day 1, 2, 3, … that a issue has a high probability of requiring advanced assistance.   While building the training set for Machine Learning it becomes clear that the model needs to understand what the issue(s) looked like on day 1, day 2, and so forth.

I made a quick chart in excel to help us all visualize the concept.

image

If I want to predict if an issue has a high probability of needing advanced assistance I need to know what an advanced issue looked like over time.   If I take the training values when the incident was resolved, Machine Learning is limited to learning the resolution patterns.  

Let me expound on this a bit more.  If provide training data at day 10 to the machine learning model I am influencing the model accuracy at day 10.   The model can be very accurate for day 10 but I want to predict issues that need assistance and address them on day 1.

Using a temporal approach the training data is expanded to each day in the life of the advanced incidents.   The model now understands what an advanced issues looked like on day 1, 2, … allowing it to provide relevant predictions.   When a case has high relevancy in the this model we can adjust resources and assist the customer quickly.

I prefer easy math so let’s assume 1000 issues needed advanced assistance over the past year and each of them took 10 days to resolve.   Instead of a training set of 1000 issues at the point of resolution, applying a temporal design expands the training set to 1000 x 10 = 10,000 views.

When using Machine Learning carefully consider if SQL Server 2016 Temporal Tables are relevant to the accuracy and design of your model.

Bob Dorr – Principal SQL Server Escalation Engineer

Spool operator and trace flag 8690

$
0
0

If you have seen enough query plans, you for sure ran into spool operators (index spool or table spool). It is documented in https://technet.microsoft.com/en-us/library/ms181032(v=sql.105).aspx

The spool operator helps improve a query performance because it stores intermediate results so that SQL doesn’t have to rescan or re-compute for repeated uses.  Spool operator has many usage.

For this blog, I’m talking about spool on the inner side of nested loop.  If your query plan has this type of spool, you will see something similar like below:

 

plan2

Spools improve performance in majority of the cases.  But it’s based on estimates. Sometimes, this can be incorrect due to unevenly distributed or skewed data, causing slow performance.

You can actually disable the spool on the inner side of nested loop with trace flag 8690.   This trace flag helped two of my customers last week.  I want to point out this is an exception (that I resolved two issues this way in one week).  In vast majority of situations, you don’t need to manually disable spool with this trace flag.

I don’t recommend you to disable table spool server wide.  But you can use querytraceon to localize a single query if you exhaust other ways to tune the query and find disable table spool helps you.

Jack Li |Senior Escalation Engineer | Microsoft SQL Server

twitter| pssdiag |Sql Nexus

What to do when you run out of disk space for In-Memory OLTP checkpoint files

$
0
0

While data for memory optimized tables resides in memory all the time with SQL Server 2014 and 2016’s In-Memory OLTP feature, we still need a means to cut down recovery time in case of crash or restart.  For disk based table, checkpoint flushes the dirty pages into data file(s).  With In-memory OLTP, there are separate set of checkpoint files that SQL Server uses.  These checkpoint files reside in a directory you specify when you create the MEMORY_OPTIMIZED_DATA filegroup required to enable In-Memory OLTP feature.

The question is what happens if the disk that host the In-Memory checkpoint files runs out of disk space?  So I decided to do some testing and document the symptoms and recovery steps here in case you run into such issue.  With our Azure, test was really easy.  All I had to do was to spawn a VM and attach a very small disk to simulate out of disk space condition.

If your disk runs out of space, you will see various errors below though your database stays online

Your insert, update or delete may fail with the following error:

Msg 3930, Level 16, State 1, Line 29

The current transaction cannot be committed and cannot support operations that write to the log file. Roll back the transaction.

In the errorlog, you will see

2015-12-23 21:38:23.920 spid11s     [ERROR] Failed to extend file ‘f:\temp\imoltp_mod1\7ef8758a-228c-4bd3-9605-d7562d23fa76\a78f6449-bd73-4160-8a3f-413f4eba8fb300000ad-00013ea0-0002′ (‘GetOverlappedResult’). Error code: 0x80070070. (d:\b\s1\sources\sql\ntdbms\hekaton\sqlhost\sqllang\fsstgl

2015-12-23 21:40:49.710 spid11s     [ERROR] Database ID: [6]. Failure to allocate cache file. Error code: 0x80070070. (d:\b\s1\sources\sql\ntdbms\hekaton\engine\hadr\ckptagent.cpp : 890 – ‘ckptAgentAllocateCfp’)

if you manually issue checkpoint command, you will get this error:

Msg 41315, Level 16, State 0, Line 5

Checkpoint operation failed in database ‘testdb’.

 

What to do when you encounter such condition?

step 1 — Add additional ‘container’

if you can append more space to the disk, just do so.  If you can’t append more space to current disk, you can add another ‘container’ to the MEMORY_OPTIMIZED_DATA to point to a folder in another drive.  You can do so by issuing a command like this:  ALTER DATABASE testdb ADD FILE (name=’imoltp_mod1′, filename=’f:\checkpoint\imoltp_mod1′) TO FILEGROUP imoltp_mod

step 2– Manually issue a checkpoint:  after you have added space or additional ‘container’ as above, just run checkpoint against the database.  then you are all set.

 

Jack Li |Senior Escalation Engineer | Microsoft SQL Server

twitter| pssdiag |Sql Nexus


Wanting your non-sysadmin users to enable certain trace flags without changing your app?

$
0
0

Even with SQL Server support for so many years, we still face something new almost every day.   Sometimes you will just have to combine things together to achieve what you need.  Here is an example due to troubleshooting a customer’s issue.

A couple of months ago, we ran into a need to enable a trace flag when troubleshooting a highly critical performance issue.  This customer had 30 databases that served many applications on a single server.   One application produced queries that negatively impacted entire server. Through troubleshooting, we discovered a trace flag (which is rarely used by the way) helped query plans for that set of queries.   The problem is that the trace flag is not suited for entire server because it would negatively impact other queries.

The initial thought is to enable the trace flag at session level.  We ran into two challenges.  First, application needs code change (which they couldn’t do) to enable it.  Secondly, dbcc traceon requires sysadmin rights.   Customer’s application used a non-sysadmin user.  These two restrictions made it seem impossible to use the trace flag.

However, we eventually came up with a way of using logon trigger coupled with wrapping the dbcc traceon command inside a stored procedure.   In doing so, we solved all problems.  We were able to isolate the trace flag just to that application without requiring sysadmin login.

Below is the code of using trace flag 9481.  I used trace flag 9481 in the demo here because it’s easier to verify the fact it indeed takes effect.

 

alter database master set trustworthy on
go

use master

go
create procedure proc_enable_tf
with   execute as owner
as
Exec(‘dbcc traceon(9481)’)

go
grant execute on proc_enable_tf to public
go

create  TRIGGER trigger_enable_tf
ON ALL SERVER
FOR LOGON
AS
BEGIN
IF app_name()= ‘Microsoft SQL Server Management Studio – Query’    — replace this with your application name
begin
exec master.dbo.proc_enable_tf
end
END;

 

After you execute above code on SQL Server 2014, you can create a login that is not member of sysadmin.  Then log in with that user using Management studio and run some query to gather xml query plan.  In the query plan, you can examine CardinalityEstimationModelVersion to see it’s 70 (instead of 120 which is default).

you can also see in message “DBCC TRACEON 9481, server process ID (SPID) 58. This is an informational message only; no user action is required” in the errorlog.

 

Reference:

Optimizer trace flags are documented in  https://support.microsoft.com/en-us/kb/2801413 and https://support.microsoft.com/en-us/kb/920093.

 

Jack Li |Senior Escalation Engineer | Microsoft SQL Server

twitter| pssdiag |Sql Nexus

TLS 1.2 Support for SQL Server 2008, 2008 R2, 2012 and 2014

$
0
0

Microsoft is pleased to announce the release of (Transport Layer Security) TLS 1.2 support in all major client drivers and SQL Server releases. The updates made available on January 29th, 2016 provide TLS 1.2 support for SQL Server 2008, SQL Server 2008 R2, SQL Server 2012 and SQL Server 2014. The client drivers that have support for TLS 1.2 are SQL Server Native Client, Microsoft ODBC Driver for SQL Server, Microsoft JDBC Driver for SQL Server and ADO.NET (SqlClient).

The list of SQL Server server and client component updates along with their download locations that support TLS 1.2 is available in the KB Article below: 3135244 TLS 1.2 support for Microsoft SQL Server

You can use KB3135244 to download the appropriate server and client component applicable for your environment. The first build numbers that provides complete TLS 1.2 support in each major release is available in KB3135244 as well. The following tables lists the client driver/components and server components which have TLS 1.2 support.

Client Components Server Components
SqlClient (.NET Framework 4.6) SQL Server 2014
SqlClient (.NET Framework 4.5.2, 4.5.1, 4.5) SQL Server 2012
SqlClient (.NET Framework 4.0) SQL Server 2008 R2
SqlClient (.NET Framework 3.5/a.k.a (.NET Framework 2.0 SP2) SQL Server 2008
MS ODBC Driver v11 (Windows)
SQL Server Native Client (for SQL Server 2012 & 2014)
SQL Server Native Client (for SQL Server 2008 R2)
SQL Server Native Client (for SQL Server 2008)
SQL Server Native Clirnt (for SQL Server 2005)
JDBC 6.0
JDBC 4.2
JDBC 4.1

Known Issue: If you are on a currently using Cumulative Update for SQL Server 2014 and need to use TLS 1.2 for encrypted endpoints for features like Availability Groups, Database Mirroring or Service Broker, then we recommend that you wait till the next Cumulative Update for SQL Server 2014 (February 2016) which adds support for this particular scenario. This is documented as a known issue in KB3135852. All the other branches mentioned in KB3135244 have support for this scenario.

Please feel free to post your questions and comments.





Latest Images