Understanding data security in cloned databases created using DBCC CLONEDATABASE

DBCC CLONEDATABASE feature was first introduced in SQL Server with SQL Server 2014 SP2 and was later added to SQL Server 2016 with SP1. The primary design goal for DBCC CLONEDATABASE which the SQL Product team had in mind is to provide mechanism to create fast, minimally invasive and transaction ally consistent database clones, useful for query tuning. Database schema, statistics and query store are commonly required data for query tuning or troubleshooting sub optimal query plans and plan regressions. To make database cloning fast, minimally invasive and consistent, the copying of metadata objects is performed at the storage engine layer by taking a transient snapshot of the production database. Database cloning have proved to be significantly useful in reducing the troubleshooting time for dbas, developers and Microsoft CSS by extracting only the data required for troubleshooting from the production databases. In addition, cloning a database also help minimize the risk of providing access to production databases or sharing business data directly with developers or support teams. Although user tables & indexes data is not copied in the cloned database, user data is still available and exposed in cloned database via statistics and query store. As the primary scenario for dbcc clonedatabase is troubleshooting, the default database clone contains the copy of schema, statistics and query store data from source database. Query store data is contained only in SQL Server 2016 instances provided if query store was turned ON in source database prior to running DBCC CLONEDATABASE.

Note: To copy the latest runtime statistics as part of Query Store, you need to execute
sp_query_store_flush_db
to flush the runtime statistics to the query store before executing DBCC CLONEDATABASE.

–Default database clone with target database clone contains the schema, statistics and query store data copy from source database

DBCC CLONEDATABASE
(source_database_name, target_database_name)

Another scenario where cloned databases are useful is in source depot and schema compare of production database schema with dev schema. In this scenario, only copy of production database schema is desired in the database clone to compare it with that in dev environment. For some businesses, especially in healthcare, finance, data privacy is critical and no user data (including statistics and query store) can be shared with developers, vendors or support teams. For this scenario, the following syntax introduced in SQL Server 2016 SP1 can be used to allow users to create schema only database clones with no user data.

– Creates Schema only database clone with no user data

DBCC CLONEDATABASE
(source_database_name, target_database_name)
WITH NO_STATISTICS, NO_QUERYSTORE

There are also scenarios where DBAs are required to share the schema and statistics database clones with developers, vendors or support teams for troubleshooting but some tables or columns within the source database contains business sensitive data (for e.g. SSN or creditcard columns) which cannot be shared with anyone. Currently, DBCC CLONEDATABASE doesn’t support selectively including or excluding objects from the source database in the cloned database. If your requirement falls in this category, you can use any of the following techniques described below to protect data in cloned databases before it is shared with anyone.

Drop statistics on tables or columns containing sensitive business data in database clone

I have uploaded a TSQL stored procedure script in our Tiger Github repository which can be used to drop all the statistics from the specified table or column on the table. You can download and run the script against the cloned database. The stored procedure needs to be executed for each table or column containing sensitive data whose stats you would like to purge. The script purges the user as well as index statistics including indexes on primary constraints, however if there are any foreign key references, it should be dropped manually.

If you would like to enhance or improvise the script, feel free to send a pull request on github for the benefit of SQL Community

– Create a database clone with no query store

DBCC CLONEDATABASE(‘AdventureWorks2014′,‘AdventureWorks2014_Clone’)
WITH NO_QUERYSTORE

– set the cloned database in read write mode

ALTER
DATABASE AdventureWorks2014_Clone SET
READ_WRITE

– create the stored procedure usp_DropTableColStatistics in cloned database

USE AdventureWorks2014_Clone

GO

create
procedure usp_DropTableColStatistics – copy script from here

– Drops all the statistics on column CardNumber on table Sales.CreditCard

exec usp_DropTableColStatistics
‘Sales.Creditcard’,‘CardNumber’

– iterate again for other tables

– If no column name is specified and only table name is specified, all the statistics on that table is dropped

– Drop all the statistics on table Sales.CreditCard

exec usp_DropTableColStatistics
‘Sales.Creditcard’

– Backup database clone with compression

BACKUP
DATABASE AdventureWorks2014_Clone TO
DISK
=
‘c:\backup\clonedb.bak’
WITH
COMPRESSION

– DROP CLONED Database post backup

DROP
DATABASE
DATABASE AdventureWorks2014_Clone

Note: The newly generated database generated from DBCC CLONEDATABASE isn’t supported to be used as a production database and is primarily intended for troubleshooting and diagnostic purposes. We recommend detaching the cloned database after the database is created.

Column-level Encryption

If the columns in the source database is encrypted by using column level encryption, the statistics inside the source database are also encrypted which also ensures statistics copied in cloned database is encrypted. The following script validates that behavior

USE [AdventureWorks2014]

GO

CREATE
MASTER
KEY
ENCRYPTION
BY
PASSWORD
=
‘$ecretP@ssw0rd’;

CREATE
SYMMETRIC
KEY TestSymKey

WITH
ALGORITHM
=
TRIPLE_DES

ENCRYPTION
BY
PASSWORD
=
‘$ecretP@ssw0rd’;

OPEN
SYMMETRIC
KEY TestSymKey

     DECRYPTION
BY
PASSWORD
=
‘$ecretP@ssw0rd’;

– added a column to encrypt CreditCardNumber

ALTER
TABLE [Sales].[CreditCard] ADD CreditCardNumber varbinary(max)

– Updating the new column and encrypting it by symmetric key

UPDATE [Sales].[CreditCard] SET CreditCardNumber =
ENCRYPTBYKEY(KEY_GUID(‘TestSymKey’),CardNumber)

–creating statistics on encrypted columns

CREATE
STATISTICS encryptedcreditcardno ON [Sales].[CreditCard](CreditCardNumber)

– Validate if statistics are encrypted

DBCC
SHOW_STATISTICS(“Sales.CreditCard”,encryptedcreditcardno)

– Creating a database clone with no query store

DBCC CLONEDATABASE(‘AdventureWorks2014′,‘AdventureWorks2014_Clone’)
WITH NO_QUERYSTORE

USE [AdventureWorks2014_Clone]

GO

DBCC
SHOW_STATISTICS(“Sales.CreditCard”,encryptedcreditcardno)

In addition to encrypted columns and statistics, if there are other encrypted objects like stored procedure, function etc in the source database, it will be copied in the database clone but the execution of the stored procedure will fail since encrypted objects is not supported in database clones.

Always Encrypted Columns

DBCC CLONEDATABASE currently doesn’t support Always encrypted objects. Thus, if the columns in the source database is encrypted using Always Encrypted encryption, DBCC CLONEDATABASE will exclude those objects present in the source database.

Note: There is a known issue where if the source database contains always encrypted objects, running DBCC CLONEDATABASE against the database results into a AV causing the client session to terminate. We will be fixing the issue in upcoming CUs for SQL Server 2016. The fix for the issue will avoid AV while creating database clone. by excluding the metadata and data for always encrypted objects.

Transparent Data Encryption (TDE)

If you use TDE to encrypt data at rest on the source database, DBCC CLONEDATABASE supports cloning of the source database but the cloned database is not encrypted by TDE. Thus, the backup of the cloned database will be unencrypted. If it is desired to encrypt and protect cloned database backup, you can enable TDE on cloned database before it is backed up as shown below

USE
master;

GO

CREATE
MASTER
KEY
ENCRYPTION
BY
PASSWORD
=
‘<UseStrongPasswordHere>’;

go

CREATE
CERTIFICATE MyServerCert WITH
SUBJECT
=
‘My DEK Certificate’;

go

BACKUP
CERTIFICATE MyServerCert

TO
FILE
=
‘MyServerCert’

WITH
PRIVATE
KEY

(


FILE
=
‘SQLPrivateKeyFile’,


ENCRYPTION
BY
PASSWORD
=
‘*rt@40(FL&dasl1′

);

GO

– Create a database clone with no query store

DBCC CLONEDATABASE(‘AdventureWorks2014′,‘AdventureWorks2014_Clone’)
WITH NO_QUERYSTORE

– set the cloned database in read write mode

ALTER
DATABASE AdventureWorks2014_Clone SET
READ_WRITE

USE AdventureWorks2014_Clone;

GO

CREATE
DATABASE
ENCRYPTION
KEY

WITH
ALGORITHM
=
AES_128

ENCRYPTION
BY
SERVER
CERTIFICATE MyServerCert;

GO

ALTER
DATABASE AdventureWorks2014_Clone SET
ENCRYPTION
ON;

GO

BACKUP
DATABASE AdventureWorks2014_Clone TO
DISK
=
‘c:\backup\clonedb.bak’
WITH
STATS=5

GO

– DROP DATABASE CLONE

DROP
DATABASE
DATABASE AdventureWorks2014_Clone

Hope the above article helps you understand and provide guidance on protecting user data in cloned database created using DBCC CLONEDATABASE

Parikshit Savjani
Senior PM, SQL Server Tiger Team
Twitter | LinkedIn
Follow us on Twitter: @mssqltiger | Team Blog: Aka.ms/sqlserverteam


SQL Server Performance Baselining Reports Unleashed for Enterprise Monitoring !!!

In my previous blog post, I had shared how you can leverage SQL Server Performance Dashboard SSRS Reports for monitoring and point in time troubleshooting for large deployments of SQL Server. In this post, we will talk about why dbas should establish SQL Server Performance baselines for their environment and how you can leverage the SQL Server Performance Baselining reporting solution shared in our Tiger Toolbox Github repository to achieve that. The solution is already being leveraged at some of our large enterprise customers and further contributed by SQL community members like SQLAdrian in GitHub.

Why SQL Server Performance Baselining is important?

  • Performance is relative – DBAs, Consultants and Support team often get called in a situation when the business application running on SQL Server is running slow. Before troubleshooting, one of the first question to ask is how slow is slow and how fast is fast?. In majority of the situations, there are changes in the workload, application code which lead to changes in performance but one can prove that something is changed only if the previous state of the SQL Server was captured. In such situations, performance baselining can assist you in learning from the historical data and trends to detect the anomalies or changes on the pattern on the server.
  • Capacity Planning – As DBAs managing large deployment and mission critical instances of SQL Server, it is important to proactively keep an eye on resource utilization (CPU, Memory, IO and storage) and workload trend over a period, to forecast and plan for more capacity if the workload trend or resource utilization is changing. To plan for capacity, performance baselining reports is the key to perform historical analysis of data and predict capacity required for future.
  • Consolidation Planning – As shadow IT and business applications running SQL Server grows in the enterprise, companies can save cost by consolidating some of their database under single instance by efficiently utilizing their hardware and resources. To plan and understand the resource utilization of SQL Server database, again performance baselining is required.
  • Virtualization\Migration to Azure – Most enterprises today are looking toward to reduce their IT capital and operational expenses and overheads. When migrating to cloud, it is important to identify their VM size or performance tier to run your SQL Server databases which is easy when you have performance baselines established.

In general, Performance Baselining is capturing last known good state metrics which can be used as a starting point for comparison. Performance monitoring is the first step for baselining. In a production environment, the goal of performance monitoring should be to be non-invasive with minimal overhead. Baselining always requires all the metrics to be captured across time dimension to perform historical time analysis. Historical analysis of performance data is useful for anomaly detection or to forecast future performance based on current workload.

In SQL Server, at a very high level, we have 3 types of performance monitoring data available for us to capture the state of SQL Server

  • Perfmon
  • DMVs (dynamic management views)
  • Xevents

In SQL Server, one needs to capture at least following details at minimum over time to successfully establish comprehensive performance baseline for SQL Server instance.

Processor Information

  • Processor(*)% Processor Time
  • Process(sqlservr)% Processor Time
  • Processor(*)% Privileged Time
  • Process(sqlservr)% Privileged Time

Memory

  • Available Mbytes
  • Memory Pages/sec
  • Process(sqlservr)Private Bytes
  • Process(sqlservr)Working Set
  • SQLServer: Memory Manager Total Server Memory
  • SQLServer: Memory Manager Target Server Memory

Physical Disk

  • PhysicalDisk(*)Avg. Disk sec/Read
  • PhysicalDisk(*)Avg. Disk sec/Write
  • PhysicalDisk Avg. Disk Queue Length
  • Disk Bytes/sec
  • Avg Disk Bytes/Transfer
  • Process(sqlservr)IO Data Operations/sec

Network

  • Network InterfaceBytes Received/sec
  • Network InterfaceBytes Sent/sec
  • Network Interface(*)Output Queue Length

SQL Server Workload

  • SQL Server: SQL Statistics Batch Requests/sec
  • SQL Server: SQL Statistics SQL Compilations/sec
  • SQL Server: SQL Statistics SQL Recompilations/sec
  • sys.dm_exec_requests
  • Sys.dm_exec_sessions
  • sys.dm_exec_connections
  • Max Workers
  • Workers Created
  • Idle Workers
  • Deadlocks

Waits

  • SQL Server: Wait Statistics
  • Sys.dm_os_waiting_tasks
  • Sys.dm_os_wait_stats
  • Latch Waits > 15 sec
  • Locks > 30 sec
  • IO Latch Timeouts
  • Long IO Status

Database Indexes

  • Sys.dm_db_index_physical_stats
  • Sys.dm_db_index_operational_stats
  • Sys.dm_db_index_usage_stats

Fatal Issues

  • Deadlocked Scheduler
  • Non-Yielding Scheduler
  • Access Violations
  • OOM
  • Corrupt Pages Status
  • Connectivity errors
  • Security errors

When monitoring large deployments of SQL Server instances, if you capture all the above data on a central location or SQL Server database, the central database becomes a bottleneck or the point of contention. To scale the performance baselining solution to large deployment of SQL Server, we use the same architecture as discussed in SQL Server Performance Dashboard SSRS Reports as shown below.

If you download SQL Performance Baselining Reporting Solution from GitHub, you will find scripts to create a database and monitoring jobs which needs to be ran on each instance of SQL Server you wish to monitor. Following are steps to deploy SQL Performance Baselining Reporting solution,

Data Collection Steps for each SQL Instance to Monitor

  1. Connect to SQL instance to monitor
  2. Run CREATEDATABASE.sql
  3. Run CREATEOBJECTS.sql
  4. Run CREATECOLLECTIONJOB.sql
  5. Check SQL Agent JOBs History to see if it runs successfully
  6. Repeat for each SQL Instance you want to monitor

Setting up & Deploying Reporting

To deploy Performance baselining SSRS reports, you can leverage the same central monitoring SSRS server hosting SQL Server Performance dashboard report. As mentioned in SQL Server Performance dashboard blog post, you can use the same steps to setup and configure SSRS server by deploying reports using SSDT or Report Manager. Alternatively, you can use PowerShell script
created by Aaron with some modifications mentioned in the previous blog post.

Once the reports are deployed, you can use the following reports to establish your performance baseline

The above report is drillthrough report so when you click on the chart for any specific data, it takes you to the daily performance trend report for that day as shown below

In the previous report, you observed that around 8PM, the sessions in suspended state increases which indicated high blocking. Around same time, most sessions are waiting on LCK_M_SCH_M and disk write latency is high around the same. Ohh, our reindexing job runs at around 8PM and we don’t use WITH ONLINE=ON while reindexing which explains the blocking.

The Memory Utilization report shows you the trend of SQL Server Memory and Available Memory on the server. From the report, you can easily deduce, Available Memory on the server went down at around 2:30PM which caused external memory pressure causing target server memory to fall below total server memory. SQL Server responded to the memory by increasing the lazy writer activity to free up pages which in turn drops the page life expectancy on the server.

Finally, the database storage report gives you a quick glance of the free space in the databases hosted on the SQL Server.

Hope you find the reports useful and if you are a SSRS developer, I would encourage you to design some cool reports for performance monitoring using perfmon, DMV or Xevents data and add it to our Github repository.

Parikshit Savjani
Senior PM, SQL Server Tiger Team
Twitter | LinkedIn
Follow us on Twitter: @mssqltiger | Team Blog: Aka.ms/sqlserverteam

Remote Blob Storage (RBS) client library setup requirements in SQL Server 2016

Cross post from https://blogs.msdn.microsoft.com/sql_server_team/remote-blob-storage-rbs-client-library-setup-requirements-in-sql-server-2016/

Remote BLOB Store (RBS) is a client library with SQL Server which allows developers to store, access, retrieve binary large objects (BLOBs) outside SQL Server database files while still maintaining the ACID properties and transactional consistency of data in Blob store. RBS allows you to efficiently utilize storage and IO resources by managing both structured and unstructured data together in a SQL Server database with structured data stored in SQL Server datafiles while unstructured data stored outside SQL Server on a commodity storage solution. RBS client library exposes a set of APIs for developers to access, modify and retrieve blobs from the blob store. Each BLOB store has its own provider library, which plugs into the RBS client library and specifies how BLOBs are stored and accessed along with SQL Server database.

SQL Server FILESTREAM feature allows you to store and manage binary data (varbinary(max)) in SQL Server utilizing the underlying NTFS storage as BLOB store.

RBS FILESTREAM provider is one such free out-of-box provider which plugs into RBS client library to allow a deployment to use FILESTREAM enabled SQL Server Database as a dedicated BLOB store for application. The RBS FILESTREAM provider utilizes the FILESTREAM feature in SQL Server for BLOB storage and ties the two technologies together. SharePoint is one such application which allows you to use RBS FILESTREAM provider on Web Front End Servers and SQL Server FILESTREAM feature for storing and managing BLOBs on NTFS storage outside SharePoint content database as documented in MSDN article here.

A number of third-party storage solution vendors have developed RBS providers that conform to these standard APIs and support BLOB storage on various storage platforms.

For more details on RBS in SQL Server you can refer to the MSDN article here.

RBS 2016 client libraries are shipped as part of SQL Server 2016 Feature pack. As a pre-requisite, RBS requires SQL Server database for storing blob metadata along with the Blob store. To connect to SQL Server, RBS requires at least ODBC driver version 11 for SQL Server 2014 and ODBC Driver version 13 for SQL Server 2016. Drivers are available at Download ODBC Driver for SQL Server. If RBS is installed on the same server as SQL Server 2016 instance, ODBC driver is already installed as part of the SQL Server installation. However when RBS is installed on separate client server like SharePoint WFE in a multi-server farm setup, ODBC driver 13.0 is not installed on the client server and needs to be installed separately as a pre-requisite for installing RBS client library.

If Microsoft ODBC Driver 13.0 for SQL Server is missing on the client server, the setup may fail with the following error when you try to hit test connection on the Database connection screen

clip_image002

If you are installing using command prompt, the output log file will show the following error,

MSI (s) (CC:FC) [15:12:55:265]: Note: 1: 1723 2: InstallCounters 3: CreateCounters 4: C:\Windows\Installer\MSI8C86.tmp CustomAction InstallCounters returned actual error code 1157 (note this may not be 100% accurate if translation happened inside sandbox)
MSI (s) (CC:FC) [15:12:55:265]: Product: Microsoft SQL Server 2016 Remote BLOB Store — Error 1723. There is a problem with this Windows Installer package. A DLL required for this installation to complete could not be run. Contact your support personnel or package vendor. Action InstallCounters, entry: CreateCounters, library: C:\Windows\Installer\MSI8C86.tmp
MSI (s) (CC:FC) [15:12:55:265]: Creating MSIHANDLE (141) of type 790531 for
thread 30204 Error 1723. There is a problem with this Windows Installer package. A DLL required for this installation to complete could not be run. Contact your support personnel or package vendor. Action InstallCounters, entry: CreateCounters, library: C:\Windows\Installer\MSI8C86.tmp

Hence, as a pre-requisite for installing RBS on the client server, it is important to install Microsoft ODBC Driver 13.0 for SQL Server or higher version on the client to avoid the above error while running the RBS.msi setup for SQL 2016.

Parikshit Savjani
Senior Program Manager (@talktosavjani)

Boosting Update Statistics performance with SQL 2014 SP1/CU6

Cross post from https://blogs.msdn.microsoft.com/sql_server_team/boosting-update-statistics-performance-with-sql-2014-sp1cu6/

With SQL 2005 end of life, many customers are planning their upgrades to SQL 2014. One of the recommended post upgrade tasks is updating the statistics of all the tables and databases in latest compatibility level. The updated statistics ensures the new optimizer in the latest release has most up to date data to produce the best plans. For VLDB scenarios, running update statistics on the entire database can take considerable time to complete depending on the volume of data. As applications are required to run 24/7 with desired performance, DBAs are increasingly under pressure to keep the maintenance window as low as possible. One such improvement introduced in SQL 2014 SP1 CU6  targeted towards optimizing and reducing execution times for UPDATE STATISTICS runs is detailed in this blog post.

In scenarios where the execution time of update statistics job is critical, the execution time can be improved by running update statistics for tables in the database in parallel.

Let me illustrate this with an example. Consider a hypothetical scenario where a database has 4 tables viz T1, T2, T3 and T4. Table T1 has 8 statistics in total combining the index and column statistics. Table T2 has 4 statistics, table T3 has 2 statistics and table T4 has 2 statistics. Let’s say updating each statistics takes 1 sec each.

In this case, if the statistics is updated serially using a single job, the total time taken to update the statistics for all the tables in the database is (8+4+2+2) = 16 seconds as shown in the figure below

 

image

If the statistics is updated in parallel using 4 parallel jobs (One job per table), the overall statistics gets updated for the entire database in 8 seconds as the shown in the figure below

 

image

Like any parallel algorithm, you can gain maximum throughput when there is even distribution of workloads across all the execution units. When executing update statistics in parallel like previous example, there is a possibility one table has many statistics and large number of records as table T1 in this case which is refreshed by the same thread or job. The worker thread or job running the update statistics on this table is likely to delay the overall execution while the other threads are idle having completed their workload. This uneven distribution of workload can reduce the overall efficiency and throughput of the UPDATE STATISTICS run. The parallel execution of UPDATE STATISTICS job can be improved by running the update statistics on individual statistics in parallel rather than table to distribute the workload evenly across threads.

In the above scenario, if the update statistics is run in parallel on individual statistics, the statistics can be completed in 4 seconds as shown in the figure below

 

image

Until SQL 2014 SP1 CU6, if you try to run the update statistics job in parallel on individual statistics as shown above, the parallel threads updating the statistics on the same table will be blocked due to the (exclusive) X LOCK taken by the first update statistics job on the UPDSTATS metadata resource of the table as shown below. Due to the blocking, the overall execution time of the UPDATE STATISTICS can be higher as shown below.

 

 

image

With SQL 2014 SP1 CU6, a new Trace Flag 7471 has been introduced to address this blocking scenario. Trace Flag 7471 is documented in the KB 3156157. Trace Flag 7471 changes the locking behavior of update statistics command such that SQL Server engine no longer acquires X LOCK on UPDSTATS resource on the table. Rather the engine acquires an (update) U LOCK on each individual statistics object being updated which doesn’t block other parallel threads on the same table running on different statistics. This behavior improves the overall concurrency of the update statistics job run executing in parallel and improves its performance.

 

image

Our empirical testing shows TF 7471 can increase the possibility of deadlock especially when creation of new statistics and updating of existing statistics are executed simultaneously. So the recommended best practice is to use TF 7471 only as a part of maintenance when update statistics jobs are run in parallel and should be turned off otherwise.

To conclude, starting SQL 2014 SP1 CU6, the performance and execution times of update statistics can be improved by running the job in parallel on individual statistics and enabling Trace flag 7471.

Parikshit Savjani (@talktosavjani)
Senior Program Manager

SQL Server Datafile maxsize reported incorrectly

Recently we encountered and filed a benign reporting BUG which reports the maxsize of the data file incorrectly. If you create a database & then alter the initial size of the database to be greater than maxsize & then query sys.database_files, sys.master_files, sysaltfiles the maxsize appears to be incorrectly reporting the older value of maxsize. However the maxsize is actually set to the new size within the database which can be verified by reading the File header data, it’s just reported incorrectly in the DMVs.

Further restarting the sql instance or taking the database offline & then online fixes the issue as the dmvs gets synchronized with file header at the database startup

The issue occurs only in SQL 2012 & SQL 2014 while it appears to be fixed in SQL 2016 RC0 & doesn’t occur in SQL 2008 R2 & below.

Following is the steps to repro the issue

REPRO

IF EXISTS (
SELECT *
FROM sys.databases
WHERE name = N'maxsize'
)
DROP DATABASE maxsize
GO

CREATE DATABASE maxsize
ON PRIMARY
(NAME = maxsize_data,
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Data\maxsize_data.mdf',
SIZE = 64MB,
MAXSIZE = 1024MB,
FILEGROWTH = 10%)
LOG ON
( NAME = maxsize_log,
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Data\maxsize_log.ldf',
SIZE = 10MB,
MAXSIZE = 50MB,
FILEGROWTH = 10%)

GO

USE [master]
GO
ALTER DATABASE [maxsize] MODIFY FILE ( NAME = N'maxsize_data', SIZE = 4194304KB )
GO

USE [maxsize]
GO
select * from sys.database_files
GO
select * from sys.sysaltfiles where dbid = DB_ID()
GO
Select * from sys.master_files where database_id = DB_ID()
GO

/*
SQL 2012\SQL 2014
size = 524288 maxsize = 131072
*/

--Taking the database offline & bring it back online

alter database [maxsize] set offline
GO
alter database [maxsize] set online
GO

/* Reports accurately */
use maxsize
GO
select * from sys.database_files
GO
select * from sys.sysaltfiles where dbid = DB_ID()
GO
Select * from sys.master_files where database_id = DB_ID()
GO

/*
SQL 2012\SQL 2014
size = 524288 maxsize = 52488
*/

USE MASTER
GO
DROP DATABASE [maxsize]
GO

Like a mentioned before, it is just a reporting bug & shouldn’t do any harm but just in case you see maxsize value greater than the size column field of data files don’t be surprised :)

Hope this helps !!!

Parikshit Savjani
Sr. Premier Field Engineer

Columnstore Indexes and Statistics

Until the introduction of Columnstore indexes, most DBAs didn’t care much about the creation of statistics on the indexed columns as the statistics are auto created on the indexed columns when the index is created. With Auto Create Statistics turned ON for the database, the optimizer will automatically trigger the creation of statistics when it detects a missing statistics which further doesn’t require any manual intervention to create statistics.

Statistics are not created automatically when you create a columnstore index. This may not be of a big concern since when you query the base table or index for the first time, the optimizer will trigger the creation of statistics when auto create statistics is turned ON. However this behavior can increase the response time of the first query to hit the columnstore index since until the statistics are created & populated, the plan cannot be created & executed to bring the result set.

Consider a scenario which may be very common with columnstore indexes. You create a Fact table with clustered columnstore index & bulk load millions of records into the table. As the data gets loaded, the tuple movers will ensure to create compressed row groups of the clustered columnstore index. Now when the first query is issued against the fact table, the optimizer spends time in creating the statistics on this large fact table by scanning millions of records which can take up considerable amount of time depending on the resources available on the server. It can easily take up to few mins to create statistics on a large fact table with billions of records on the table on an idle system. If you were doing a test to compare the performance columnstore index v/s rowstore index, you might easily assume that columnstore index is running slow since your rowstore index might complete the query in 5-10 mins while the first query against columnstore index took so long. Further if you end up cancelling the query since it is taking too long, the statistics never get created & each time you issue the query, the query appears to take long.

It is important to understand & be familiar with this behavior of the columnstore indexes. You may not observe the difference if the table is small but if the data is huge, the time to create the statistics is considerable which can amplify the response time of the first query against the index.

Hope this helps !!!

Parikshit Savjani
Sr. Premier Field Engineer

FUN – Self-Assessment of your social presence in fb using Power BI

If you read the title of this post & were wondering, What the hell is that ? , Trust me I have no clue how I made this up but I felt it would be an appropriate title when someone wants to perform data analysis on their fb data to understand how much active they are on fb YoY or month-on-month, who are your good friends, who like your posts the most & discover some long lost friends who were your good friends earlier but have lost touch.

Assumption: The modern definition of good friend is the friend who likes your all your post or activity in Facebook ;)

One might feel data analysis of your fb activity may be completely useless task until you discover some insights about your life or persona changes reflected in your fb activity. Data Analysis is like a treasure hunt where all your efforts are useless until you hit the jackpot. In this blog I would reflect some of the insights which I discovered after analyzing my fb data using Power BI and relating it to some of the changes in my life.

Warning: The following data analysis might be a complete waste of your time and highly non-productive. Any accidental learning from this post might be completely coincidental & at your own risk.

Here is the PowerBI dashboard of my Facebook activity
Fbdashboard

Lets dig into some insights which I discovered from the dashboard

  • So I have posted 203 posts from 2009 till date with around 2835 likes which averages to around 14 likes per post which is not that bad. At any given point in time I have atleast 14 friends who likes my activity on fb.
  • I have got the most number of likes (210 & 135) when my books got published and my friends really liked me for getting published which means I should continuing doing it :)
  • I was most active in Facebook in the year 2012 but after that my activity dropped significantly in the year 2013 which is the year I got married ( No further insights might be warranted on this as it is self-explanatory).
  • The Word cloud which is the new custom visual available in Power BI gallery emphasizes & amplifies some of the good friends I have who have liked my posts & activity on fb. If I slice the data for the year 2013, the word cloud looks as shown below
    Year 2013 is the only time when my wife Charmi has liked my fb posts the most and then we got married & rest is history. So here is the data I have that I can share with my wife that post marriage Husbands are no longer appreciated any more. On the other side, Can husbands afford to not like any posts from their wives ???

WordCloud

On the other hand, When I sliced my data for the year 2009, I found few friends who used to like me very much but have lost touch somewhere so I am planning to connect with them again while when I slice the data for the year 2015, I have found some new friends in there.

  • Another interesting insight which I discovered from the type of my posts in FB is, until the year 2013 majority of my posts are status updates but off late I don’t like sharing much of the status updates & occasionally I just log in facebook and re-share videos or link I found interesting. I guess that has something to do with my maturity where in earlier, I just posted anything or everything which I did as status updates but slowly I don’t like doing status update as much.

Postsbytype

  • Finally I discovered that I am most active on FB in the month of December which may be due to the Holidays and least active in the month of Jun & Nov which are the times when I like to take vacations since June is our anniversary while Nov is my wife’s birthday.

PostsbyMonth

If anyone is interested to reuse my Power BI dashboard, Please little r me or message me and I will be happy to share the *.pbix file for your self-assessment.

Hope this helps !!!

Parikshit Savjani
Sr. Premier Field Engineer

My book on SQL on Azure succintlly released

I feel extremely excited to share the latest release of my book on SQL on Azure Succintly with Syncfusion publications which talks about the deployments of SQL Server to Azure.

You can download the latest version of the ebook from the following link
http://www.syncfusion.com/resources/techportal/details/ebooks/sqlazure

In recent years, we have witnessed how information technology has contributed to the growth and reach of businesses around the globe. Companies which are primarily technology companies, like Amazon, Facebook, and Uber, have built their businesses solely on the basis of information technology and have grown unimaginably. In today’s world we cannot imagine a business without a website, app, or web presence, but does having a website, app, or web presence guarantee a business will be successful? Until a business is successful, companies aren’t sure whether their investment in information technology and IT infrastructure is fruitful or not.

Setting up and maintaining a website, app, database, or data warehouse incurs a lot of capital and operational expenses (CAPEX and OPEX) for a company whose core business may not be IT. As a result, small and medium business around the world are looking towards cloud solution providers to offer IT as a service and pay only as they use with elastic scale up-down flexibility. Even large enterprises who can afford their own data centers and infrastructure are looking towards cloud providers since data is growing at a rapid pace and investing in new hardware and infrastructure for new projects, development, and testing may not make sense until the project becomes critical and revenue generating.

Cloud services allow big enterprises to innovate and research at an optimized cost without investing upfront on the infrastructure in their data centers. Further, when products are designed to run as a service, the product release cycle can be agile and fast. All of these benefits make cloud services an attractive offering and optimal alternative for businesses today.

In this book, I introduce you to Microsoft SQL Server (RDBMS) Offerings on Azure and talk about the use cases, considerations, configurations, and design optimizations for running SQL Server on Azure to give you predictable performance at optimized cost.

This book is primarily focused on instructing individuals with prior knowledge of SQL Server how to manage, optimize, and design applications running SQL Server on Azure.

Disabling Non-Clustered Columnstore Index doesn’t appear to execute in Stored Procedure in SQL 2012

Consider the following scenario, you have a large fact table with non-clustered columnstore index. In SQL 2012, Columnstore index is not updatable hence at night when you load or purge the table, the first step would be to disable non-clustered columnstore index followed by insert/update/delete to the table and the final step would be to rebuild the non-clustered columnstore index to ensure the table is available for querying & reporting on the following day to give you the desired reporting performance.

The Stored Procedure which is used to load or purge the table would be of the form as shown

CREATE PROCEDURE dbo.usp_Update_Fact_table

AS

BEGIN

AlterIndex [csindex_wk_fact] ON [dbo].[Wk_Fact] disable;– Disable Columnstore Index before Update

UPDATE [dbo].[Wk_fact] – Update Fact Table

Alterindex csindex_wk_fact  ON dbo.Wk_Fact rebuild;– Rebuild Index After Index

END

When you execute the above stored procedure, it fails with following error

UPDATE statement failed because data cannot be updated in a table with a columnstore index. Consider disabling the columnstore index before issuing the UPDATE statement, then rebuilding the columnstore index after UPDATE is complete.

The above error message might appear that ALTER INDEX… DISABLE statement is skipped during the execution of stored procedure and the stored procedure throws the above error while executing the UPDATE query. But the error message doesn’t occur during execution, it actually occurs during compilation of the stored procedure.

When the stored procedure is being compiled, it checks that the Fact table has non-clustered columnstore index which cannot be modified and hence throws the above error at the compile time. This can be further verified by trying to get the estimated showplan of the stored procedure which fails with the same error as above.

SET SHOWPLAN_XML ON
EXEC dbo.usp_Update_Fact_table

The Query Optimizer detects the Non-Clustered Columnstore index during the Query Optimization phase of the Query Plan generation and is an optimization to prevent plan generation to save some resources required to compile, execute and cache the query plan which might ultimately error out in the execution phase.

To overcome the above error, you have following options

  1. Run ALTER INDEX….disable outside stored procedure prior to the insert/update/del stored procedure as a first step of the Batch Load/purge job.
  2. If you need to run the disable index as part of same stored procedure, you can execute the DML to Fact table as dynamicSQL which will avoid the compile time check as shown below

    CREATE PROCEDURE dbo.usp_Update_Fact_table
    AS
    BEGIN

    Alter Index [csindex_wk_fact] ON [dbo].[Wk_Fact] disable;– Disable Columnstore Index before UpdateEXEC sp_executesql  ‘UPDATE [dbo].[Wk_fact]‘;– Update Fact Table as Dynamic SQL

    Alter index csindex_wk_fact  ON dbo.Wk_Fact rebuild;– Rebuild Index After Index

    END

  3. Upgrade to SQL 2014 or above as the columnstore Index is updatable in SQL 2014 and above version where these compile checks do not occur.

Hope this helps !!!

Parikshit Savjani
Senior Premier Field Engineer

Added Auto Indexing & Indexing Mode parameters in Add-DocumentCollection Cmdlet

Couple of weeks back I released Powershell cmdlets for DocumentDB in Github (Further info on this can be found in my blog here). Today I merged & added  Automatic Indexing & IndexingMode parameters to the Add-DocumentCollection Cmdlet to the project in the module.

https://github.com/savjani/Azure-DocumentDB-Powershell-Cmdlets

Azure DocumentDB collections are auto-indexed by default allowing the developers  to focus on the application development & not worry about the schema or index definition for the database. Automatic indexing of documents is enabled by consistent, write optimized, lock free, and log structured index maintenance techniques.

Anyone who has worked on databases would know, while indexes are useful to expedite querying, it is an additional overhead to write, maintain & store indexes. Hence it is important & efficient that we use indexing wisely & do not index attributes, which may never be used for querying or filtering.

Secondly DocumentDB supports two modes for Indexing viz Consistent (synchronous mode) v/s Lazy (asynchronous mode). If you are someone like me who has a database background, you would know that for continuous transactional OLTP workload we prefer to have Consistent indexing wherein the indexes are updated & maintained as data is inserted, updated or deleted. However for batch Datawarehousing type workloads we prefer to drop the index, load the data & recreate the index which will ensure to give us the best write performance. Similarly in DocumentDB if we intend to perform bulk load of the documents the write performance would be improved with Indexing Mode set to Lazy as compared to synchronous consistent mode.

At the time of writing this blog, AutoIndexing & IndexingMode settings are only supported during creation of collections & cannot be altered after collection is created. hence it is important that we plan & choose the right setting for our collection while creating it.

To support these settings during creating of collection using Powershell, I have added the following Optional parameters in Add-DocumentCollection Cmdlet.

AutoIndexing – ($true or $False) : This parameter is used to specify whether we want Automatic Indexing to be turned ON or OFF in the collection. It is optional parameter & if omitted, Automatic Indexing is turned ON by default.

IndexingMode – (Consistent or Lazy): This parameter is used to specify the IndexingMode for the collection. It is optional parameter  & if omitted it is set to Consistent by default.

A good reference for DocumentDB Indexing Policy is following article from Product Group

https://azure.microsoft.com/en-us/documentation/articles/documentdb-indexing-policies/

Usage:

Creating a New Database & New Document Collection with default AutoIndexing & IndexingMode

Import-Module ‘C:\Users\pariks\Documents\GitHub\Azure-DocumentDB-Powershell-Cmdlets\Azrdocdb\Azrdocdb1\bin\Debug\Azrdocdb.dll’
$ctx = New-Context -Uri <uri> -Key <key>
$db = Add-Database -Context $ctx -Name “DocDBPS2″
$coll = Add-DocumentCollection -Context $ctx -DatabaseLink $db.SelfLink -Name “DocCollPS”

Get-database -Context $ctx -SelfLink $db.SelfLink | ft

Creating a New Database & New Document Collection & Setting Automatic Indexing Policy

Import-Module ‘C:\Users\pariks\Documents\GitHub\Azure-DocumentDB-Powershell-Cmdlets\Azrdocdb\Azrdocdb1\bin\Debug\Azrdocdb.dll’
$ctx = New-Context -Uri <uri> -Key <key>
$db = Add-Database -Context $ctx -Name ‘DocDB’
$coll = Add-DocumentCollection -Context $ctx -DatabaseLink $db.SelfLink -Name ‘DocCollPS’ -AutoIndexing $true
Get-database -Context $ctx -SelfLink $db.SelfLink | ft

Creating a New Database & New Document Collection & Setting Automatic Indexing Policy & IndexingMode

Import-Module ‘C:\Users\pariks\Documents\GitHub\Azure-DocumentDB-Powershell-Cmdlets\Azrdocdb\Azrdocdb1\bin\Debug\Azrdocdb.dll’
$ctx = New-Context -Uri <uri> -Key <key>
$db = Add-Database -Context $ctx -Name ‘DocDB’
$coll = Add-DocumentCollection -Context $ctx -DatabaseLink $db.SelfLink -Name ‘DocCollPS’ -AutoIndexing $true -IndexingMode ‘Lazy’
Get-database -Context $ctx -SelfLink $db.SelfLink | ft

Hope this Helps Smile

Parikshit Savjani
Premier Field Engineer