SQL Server Datafile maxsize reported incorrectly

Recently we encountered and filed a benign reporting BUG which reports the maxsize of the data file incorrectly. If you create a database & then alter the initial size of the database to be greater than maxsize & then query sys.database_files, sys.master_files, sysaltfiles the maxsize appears to be incorrectly reporting the older value of maxsize. However the maxsize is actually set to the new size within the database which can be verified by reading the File header data, it’s just reported incorrectly in the DMVs.

Further restarting the sql instance or taking the database offline & then online fixes the issue as the dmvs gets synchronized with file header at the database startup

The issue occurs only in SQL 2012 & SQL 2014 while it appears to be fixed in SQL 2016 RC0 & doesn’t occur in SQL 2008 R2 & below.

Following is the steps to repro the issue

REPRO

IF EXISTS (
SELECT *
FROM sys.databases
WHERE name = N'maxsize'
)
DROP DATABASE maxsize
GO

CREATE DATABASE maxsize
ON PRIMARY
(NAME = maxsize_data,
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Data\maxsize_data.mdf',
SIZE = 64MB,
MAXSIZE = 1024MB,
FILEGROWTH = 10%)
LOG ON
( NAME = maxsize_log,
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Data\maxsize_log.ldf',
SIZE = 10MB,
MAXSIZE = 50MB,
FILEGROWTH = 10%)

GO

USE [master]
GO
ALTER DATABASE [maxsize] MODIFY FILE ( NAME = N'maxsize_data', SIZE = 4194304KB )
GO

USE [maxsize]
GO
select * from sys.database_files
GO
select * from sys.sysaltfiles where dbid = DB_ID()
GO
Select * from sys.master_files where database_id = DB_ID()
GO

/*
SQL 2012\SQL 2014
size = 524288 maxsize = 131072
*/

--Taking the database offline & bring it back online

alter database [maxsize] set offline
GO
alter database [maxsize] set online
GO

/* Reports accurately */
use maxsize
GO
select * from sys.database_files
GO
select * from sys.sysaltfiles where dbid = DB_ID()
GO
Select * from sys.master_files where database_id = DB_ID()
GO

/*
SQL 2012\SQL 2014
size = 524288 maxsize = 52488
*/

USE MASTER
GO
DROP DATABASE [maxsize]
GO

Like a mentioned before, it is just a reporting bug & shouldn’t do any harm but just in case you see maxsize value greater than the size column field of data files don’t be surprised :)

Hope this helps !!!

Parikshit Savjani
Sr. Premier Field Engineer

Columnstore Indexes and Statistics

Until the introduction of Columnstore indexes, most DBAs didn’t care much about the creation of statistics on the indexed columns as the statistics are auto created on the indexed columns when the index is created. With Auto Create Statistics turned ON for the database, the optimizer will automatically trigger the creation of statistics when it detects a missing statistics which further doesn’t require any manual intervention to create statistics.

Statistics are not created automatically when you create a columnstore index. This may not be of a big concern since when you query the base table or index for the first time, the optimizer will trigger the creation of statistics when auto create statistics is turned ON. However this behavior can increase the response time of the first query to hit the columnstore index since until the statistics are created & populated, the plan cannot be created & executed to bring the result set.

Consider a scenario which may be very common with columnstore indexes. You create a Fact table with clustered columnstore index & bulk load millions of records into the table. As the data gets loaded, the tuple movers will ensure to create compressed row groups of the clustered columnstore index. Now when the first query is issued against the fact table, the optimizer spends time in creating the statistics on this large fact table by scanning millions of records which can take up considerable amount of time depending on the resources available on the server. It can easily take up to few mins to create statistics on a large fact table with billions of records on the table on an idle system. If you were doing a test to compare the performance columnstore index v/s rowstore index, you might easily assume that columnstore index is running slow since your rowstore index might complete the query in 5-10 mins while the first query against columnstore index took so long. Further if you end up cancelling the query since it is taking too long, the statistics never get created & each time you issue the query, the query appears to take long.

It is important to understand & be familiar with this behavior of the columnstore indexes. You may not observe the difference if the table is small but if the data is huge, the time to create the statistics is considerable which can amplify the response time of the first query against the index.

Hope this helps !!!

Parikshit Savjani
Sr. Premier Field Engineer

FUN – Self-Assessment of your social presence in fb using Power BI

If you read the title of this post & were wondering, What the hell is that ? , Trust me I have no clue how I made this up but I felt it would be an appropriate title when someone wants to perform data analysis on their fb data to understand how much active they are on fb YoY or month-on-month, who are your good friends, who like your posts the most & discover some long lost friends who were your good friends earlier but have lost touch.

Assumption: The modern definition of good friend is the friend who likes your all your post or activity in Facebook ;)

One might feel data analysis of your fb activity may be completely useless task until you discover some insights about your life or persona changes reflected in your fb activity. Data Analysis is like a treasure hunt where all your efforts are useless until you hit the jackpot. In this blog I would reflect some of the insights which I discovered after analyzing my fb data using Power BI and relating it to some of the changes in my life.

Warning: The following data analysis might be a complete waste of your time and highly non-productive. Any accidental learning from this post might be completely coincidental & at your own risk.

Here is the PowerBI dashboard of my Facebook activity
Fbdashboard

Lets dig into some insights which I discovered from the dashboard

  • So I have posted 203 posts from 2009 till date with around 2835 likes which averages to around 14 likes per post which is not that bad. At any given point in time I have atleast 14 friends who likes my activity on fb.
  • I have got the most number of likes (210 & 135) when my books got published and my friends really liked me for getting published which means I should continuing doing it :)
  • I was most active in Facebook in the year 2012 but after that my activity dropped significantly in the year 2013 which is the year I got married ( No further insights might be warranted on this as it is self-explanatory).
  • The Word cloud which is the new custom visual available in Power BI gallery emphasizes & amplifies some of the good friends I have who have liked my posts & activity on fb. If I slice the data for the year 2013, the word cloud looks as shown below
    Year 2013 is the only time when my wife Charmi has liked my fb posts the most and then we got married & rest is history. So here is the data I have that I can share with my wife that post marriage Husbands are no longer appreciated any more. On the other side, Can husbands afford to not like any posts from their wives ???

WordCloud

On the other hand, When I sliced my data for the year 2009, I found few friends who used to like me very much but have lost touch somewhere so I am planning to connect with them again while when I slice the data for the year 2015, I have found some new friends in there.

  • Another interesting insight which I discovered from the type of my posts in FB is, until the year 2013 majority of my posts are status updates but off late I don’t like sharing much of the status updates & occasionally I just log in facebook and re-share videos or link I found interesting. I guess that has something to do with my maturity where in earlier, I just posted anything or everything which I did as status updates but slowly I don’t like doing status update as much.

Postsbytype

  • Finally I discovered that I am most active on FB in the month of December which may be due to the Holidays and least active in the month of Jun & Nov which are the times when I like to take vacations since June is our anniversary while Nov is my wife’s birthday.

PostsbyMonth

If anyone is interested to reuse my Power BI dashboard, Please little r me or message me and I will be happy to share the *.pbix file for your self-assessment.

Hope this helps !!!

Parikshit Savjani
Sr. Premier Field Engineer

My book on SQL on Azure succintlly released

I feel extremely excited to share the latest release of my book on SQL on Azure Succintly with Syncfusion publications which talks about the deployments of SQL Server to Azure.

You can download the latest version of the ebook from the following link
http://www.syncfusion.com/resources/techportal/details/ebooks/sqlazure

In recent years, we have witnessed how information technology has contributed to the growth and reach of businesses around the globe. Companies which are primarily technology companies, like Amazon, Facebook, and Uber, have built their businesses solely on the basis of information technology and have grown unimaginably. In today’s world we cannot imagine a business without a website, app, or web presence, but does having a website, app, or web presence guarantee a business will be successful? Until a business is successful, companies aren’t sure whether their investment in information technology and IT infrastructure is fruitful or not.

Setting up and maintaining a website, app, database, or data warehouse incurs a lot of capital and operational expenses (CAPEX and OPEX) for a company whose core business may not be IT. As a result, small and medium business around the world are looking towards cloud solution providers to offer IT as a service and pay only as they use with elastic scale up-down flexibility. Even large enterprises who can afford their own data centers and infrastructure are looking towards cloud providers since data is growing at a rapid pace and investing in new hardware and infrastructure for new projects, development, and testing may not make sense until the project becomes critical and revenue generating.

Cloud services allow big enterprises to innovate and research at an optimized cost without investing upfront on the infrastructure in their data centers. Further, when products are designed to run as a service, the product release cycle can be agile and fast. All of these benefits make cloud services an attractive offering and optimal alternative for businesses today.

In this book, I introduce you to Microsoft SQL Server (RDBMS) Offerings on Azure and talk about the use cases, considerations, configurations, and design optimizations for running SQL Server on Azure to give you predictable performance at optimized cost.

This book is primarily focused on instructing individuals with prior knowledge of SQL Server how to manage, optimize, and design applications running SQL Server on Azure.

Disabling Non-Clustered Columnstore Index doesn’t appear to execute in Stored Procedure in SQL 2012

Consider the following scenario, you have a large fact table with non-clustered columnstore index. In SQL 2012, Columnstore index is not updatable hence at night when you load or purge the table, the first step would be to disable non-clustered columnstore index followed by insert/update/delete to the table and the final step would be to rebuild the non-clustered columnstore index to ensure the table is available for querying & reporting on the following day to give you the desired reporting performance.

The Stored Procedure which is used to load or purge the table would be of the form as shown

CREATE PROCEDURE dbo.usp_Update_Fact_table

AS

BEGIN

AlterIndex [csindex_wk_fact] ON [dbo].[Wk_Fact] disable;– Disable Columnstore Index before Update

UPDATE [dbo].[Wk_fact] – Update Fact Table

Alterindex csindex_wk_fact  ON dbo.Wk_Fact rebuild;– Rebuild Index After Index

END

When you execute the above stored procedure, it fails with following error

UPDATE statement failed because data cannot be updated in a table with a columnstore index. Consider disabling the columnstore index before issuing the UPDATE statement, then rebuilding the columnstore index after UPDATE is complete.

The above error message might appear that ALTER INDEX… DISABLE statement is skipped during the execution of stored procedure and the stored procedure throws the above error while executing the UPDATE query. But the error message doesn’t occur during execution, it actually occurs during compilation of the stored procedure.

When the stored procedure is being compiled, it checks that the Fact table has non-clustered columnstore index which cannot be modified and hence throws the above error at the compile time. This can be further verified by trying to get the estimated showplan of the stored procedure which fails with the same error as above.

SET SHOWPLAN_XML ON
EXEC dbo.usp_Update_Fact_table

The Query Optimizer detects the Non-Clustered Columnstore index during the Query Optimization phase of the Query Plan generation and is an optimization to prevent plan generation to save some resources required to compile, execute and cache the query plan which might ultimately error out in the execution phase.

To overcome the above error, you have following options

  1. Run ALTER INDEX….disable outside stored procedure prior to the insert/update/del stored procedure as a first step of the Batch Load/purge job.
  2. If you need to run the disable index as part of same stored procedure, you can execute the DML to Fact table as dynamicSQL which will avoid the compile time check as shown below

    CREATE PROCEDURE dbo.usp_Update_Fact_table
    AS
    BEGIN

    Alter Index [csindex_wk_fact] ON [dbo].[Wk_Fact] disable;– Disable Columnstore Index before UpdateEXEC sp_executesql  ‘UPDATE [dbo].[Wk_fact]‘;– Update Fact Table as Dynamic SQL

    Alter index csindex_wk_fact  ON dbo.Wk_Fact rebuild;– Rebuild Index After Index

    END

  3. Upgrade to SQL 2014 or above as the columnstore Index is updatable in SQL 2014 and above version where these compile checks do not occur.

Hope this helps !!!

Parikshit Savjani
Senior Premier Field Engineer

Added Auto Indexing & Indexing Mode parameters in Add-DocumentCollection Cmdlet

Couple of weeks back I released Powershell cmdlets for DocumentDB in Github (Further info on this can be found in my blog here). Today I merged & added  Automatic Indexing & IndexingMode parameters to the Add-DocumentCollection Cmdlet to the project in the module.

https://github.com/savjani/Azure-DocumentDB-Powershell-Cmdlets

Azure DocumentDB collections are auto-indexed by default allowing the developers  to focus on the application development & not worry about the schema or index definition for the database. Automatic indexing of documents is enabled by consistent, write optimized, lock free, and log structured index maintenance techniques.

Anyone who has worked on databases would know, while indexes are useful to expedite querying, it is an additional overhead to write, maintain & store indexes. Hence it is important & efficient that we use indexing wisely & do not index attributes, which may never be used for querying or filtering.

Secondly DocumentDB supports two modes for Indexing viz Consistent (synchronous mode) v/s Lazy (asynchronous mode). If you are someone like me who has a database background, you would know that for continuous transactional OLTP workload we prefer to have Consistent indexing wherein the indexes are updated & maintained as data is inserted, updated or deleted. However for batch Datawarehousing type workloads we prefer to drop the index, load the data & recreate the index which will ensure to give us the best write performance. Similarly in DocumentDB if we intend to perform bulk load of the documents the write performance would be improved with Indexing Mode set to Lazy as compared to synchronous consistent mode.

At the time of writing this blog, AutoIndexing & IndexingMode settings are only supported during creation of collections & cannot be altered after collection is created. hence it is important that we plan & choose the right setting for our collection while creating it.

To support these settings during creating of collection using Powershell, I have added the following Optional parameters in Add-DocumentCollection Cmdlet.

AutoIndexing – ($true or $False) : This parameter is used to specify whether we want Automatic Indexing to be turned ON or OFF in the collection. It is optional parameter & if omitted, Automatic Indexing is turned ON by default.

IndexingMode – (Consistent or Lazy): This parameter is used to specify the IndexingMode for the collection. It is optional parameter  & if omitted it is set to Consistent by default.

A good reference for DocumentDB Indexing Policy is following article from Product Group

https://azure.microsoft.com/en-us/documentation/articles/documentdb-indexing-policies/

Usage:

Creating a New Database & New Document Collection with default AutoIndexing & IndexingMode

Import-Module ‘C:\Users\pariks\Documents\GitHub\Azure-DocumentDB-Powershell-Cmdlets\Azrdocdb\Azrdocdb1\bin\Debug\Azrdocdb.dll’
$ctx = New-Context -Uri <uri> -Key <key>
$db = Add-Database -Context $ctx -Name “DocDBPS2″
$coll = Add-DocumentCollection -Context $ctx -DatabaseLink $db.SelfLink -Name “DocCollPS”

Get-database -Context $ctx -SelfLink $db.SelfLink | ft

Creating a New Database & New Document Collection & Setting Automatic Indexing Policy

Import-Module ‘C:\Users\pariks\Documents\GitHub\Azure-DocumentDB-Powershell-Cmdlets\Azrdocdb\Azrdocdb1\bin\Debug\Azrdocdb.dll’
$ctx = New-Context -Uri <uri> -Key <key>
$db = Add-Database -Context $ctx -Name ‘DocDB’
$coll = Add-DocumentCollection -Context $ctx -DatabaseLink $db.SelfLink -Name ‘DocCollPS’ -AutoIndexing $true
Get-database -Context $ctx -SelfLink $db.SelfLink | ft

Creating a New Database & New Document Collection & Setting Automatic Indexing Policy & IndexingMode

Import-Module ‘C:\Users\pariks\Documents\GitHub\Azure-DocumentDB-Powershell-Cmdlets\Azrdocdb\Azrdocdb1\bin\Debug\Azrdocdb.dll’
$ctx = New-Context -Uri <uri> -Key <key>
$db = Add-Database -Context $ctx -Name ‘DocDB’
$coll = Add-DocumentCollection -Context $ctx -DatabaseLink $db.SelfLink -Name ‘DocCollPS’ -AutoIndexing $true -IndexingMode ‘Lazy’
Get-database -Context $ctx -SelfLink $db.SelfLink | ft

Hope this Helps Smile

Parikshit Savjani
Premier Field Engineer

Releasing Powershell Cmdlets for Azure DocumentDB in Github

Azure DocumentDB is Microsoft’s new budding NoSQL Schema Free database offering which is relatively a recent addition to the mammoth Microsoft Data Platform Stack.

Azure DocumentDB provides great flexibility to the Developer world who feels a lot restricted with RDBMS data model, adhering to the data types, schema, index designs & patterns. Azure DocumentDB gives flexibility to the Developers to store entire records or JSON documents in Azure DocDB database uniquely identified by a key-value pair. All the attributes within a document are indexed by default with range or hash indexes thereby expediting the query processing against the documents & improving the query performance for reporting & analytics. Hence Developer just needs to serialize the data structures into JSON format & store them in Azure DocumentDB database while Azure DocumentDB takes care of indexing & storage of the attributes. Azure DocumentDB supports querying the attributes using the familiar SQL language over JSON structure making it compatible with most reporting & analytics platform.

While I started learning & evaluating Azure DocumentDB, I found one of component which was missing in Azure DocumentDB is out of box Powershell cmdlets for managing & administering the database. I found this was a opportunity for me to kick start a new community project in Github & try to get my hands dirty with Github public repository which I was aspiring to learn & adopt for quite sometime now.

With this excitement & modesty, I have created a project in Azure DocumentDB for Powershell cmdlets in Github to perform some basic management operation in Azure DocumentDB

https://github.com/savjani/Azure-DocumentDB-Powershell-Cmdlets

Further I invite the community members to further extend this modest list of Cmdlets to a full blown DocDB management & administration module similar to SQLPS

Powershell Cmdlet Definition & its Usage

Creating a New Database

Import-Module C:\Users\pariks\Documents\WindowsPowerShell\Modules\DocDB\Azrdocdb.dll
$ctx = New-Context -Uri <uri> -Key <key>
$db = Add-Database -Context $ctx -Name “DocDBPS2″
Get-database -Context $ctx -SelfLink $db.SelfLink | ft

Creating a New Database & New Document Collection

Import-Module C:\Users\pariks\Documents\WindowsPowerShell\Modules\DocDB\Azrdocdb.dll
$ctx = New-Context -Uri <uri> -Key <key>
$db = Add-Database -Context $ctx -Name “DocDBPS2″
$coll = Add-DocumentCollection -Context $ctx -DatabaseLink $db.SelfLink -Name “DocCollPS”
Get-database -Context $ctx -SelfLink $db.SelfLink | ft

List all the Databases in the Given Azure DocumentDB Database Account

Import-Module C:\Users\pariks\Documents\WindowsPowerShell\Modules\DocDB\Azrdocdb.dll
$ctx = New-Context -Uri <uri> -Key <key>
Get-databases -Context $ctx | ft

List Database Properties for a given Database

Import-Module C:\Users\pariks\Documents\WindowsPowerShell\Modules\DocDB\Azrdocdb.dll
$ctx = New-Context -Uri <uri> -Key <key>
$db = Get-databases -Context $ctx | Where-Object {$_.Id -eq “DocDBPS”}
Get-database -Context $ctx -SelfLink $db.SelfLink | ft

Drop Database

Import-Module C:\Users\pariks\Documents\WindowsPowerShell\Modules\DocDB\Azrdocdb.dll
$ctx = New-Context -Uri <uri> -Key <key>
$db = Get-databases -Context $ctx | Where-Object {$_.Id -eq “DocDBPS”}
remove-database -Context $ctx -SelfLink $db.SelfLink

Get Database Account Consistency Level

Import-Module C:\Users\pariks\Documents\WindowsPowerShell\Modules\DocDB\Azrdocdb.dll
$ctx = New-Context -Uri <uri> -Key <key>
Get-DatabaseAccountConsistencyLevel -Context $ctx

Set Database Account Consistency Level

Import-Module C:\Users\pariks\Documents\WindowsPowerShell\Modules\DocDB\Azrdocdb.dll
$ctx = New-Context -Uri <uri> -Key <key>
Set-DatabaseAccountConsistencyLevel -Context $ctx -DefaultConsistencyLevel Eventual
Get-DatabaseAccountConsistencyLevel -Context $ctx

Hope this helps :)

Parikshit Savjani
Premier Field Engineer

Ungrouping Column using DAX in PowerPivot or SSAS Tabular

To start this blog post, let me first explain the problem. Consider a scenario where you receive an excel sheet or data in the following format

Category SubCategory
Bikes Mountain Bikes
Road Bikes
Touring Bikes
Components Handlebars
Bottom Brackets
Brakes
Chains
Cranksets
Derailleurs
Forks
Headsets
Mountain Frames
Pedals
Road Frames
Saddles
Touring Frames
Wheels
Clothing Bib-Shorts
Caps
Gloves
Jerseys

In the above case, the data is already grouped by Category. If you try to load this data in PowerPivot Data Model & try to create a Power view Column Chart report on Category, the view would look similar to shown below

image

Most of the data is grouped in Blank Category since the Power Pivot Data Model doesn’t know or understand that data was already grouped by Category.

To ungroup the data back in PowerPivot Data Model, I would first add an Index column to the existing data which can be easily added when you are loading the data using Power Query. So the Power Pivot Data Model now looks as shown below

image

Next, we add Calculated Column viz MaxIndex to calculate & store the index value of last non blank value of the Category for each row. The DAX calculation is shown below

=MAXX(FILTER(FILTER(Table1,LEN(Table1[Category])>0),Table1[Index]<=EARLIER(Table1[Index])),Table1[Index])

The output of the above calculation looks like following

image

The above table is now like a Parent-Child Hierarchy where MaxIndex is parent id. So to ungroup the Category column, we can now use LOOKUPVALUE Function as shown below

=LOOKUPVALUE([Category],[Index],[MaxIndex])

The output of the above calculation is as shown below

image

We can now hide the Category, MaxIndex & Index Columns from Client tools to make it invisible from Power view. Now if you visualize the same data in Column chart in Power view, the result is what we are looking for

image

Hope this helps !!!

Parikshit Savjani
Sr. Premier Field Engineer

How SQL Server Writes Transaction Log Sequentially

Last week in of the SQL Community Forum, one of my friend asked the following question

If I have 2 Tlog files for one database, how will these files fillup?

A) File 1 will be filled first and then File 2
B) Both the files will be filled together.

If B is true then how will sequentially be maintained?

The reason this question was asked since he observed that if he has multiple logfiles, if he performed a single bulk insert operation, both the log files were 50% filled rather than single log file 100 % filled.

If you are looking for the reason why this would happened please look at the end of post but in this blog post, I would like to further show how can we study the internals of SQL Server with the help of Xevents. To monitor the file writes we can use the sqlserver.file_write_completed Xevents to understand how SQL Server performs the sequential write operation on each SQL Server Log File in case of multiple Log files.

For my testing, I reused the same example used by Jonathan Kehayias in his blog where he explain the proportional fill algorithm

Create a Database with single Data File & 5 Log Files & a single empty table in which we perform the bulk insert of database.

 
USE [master]
GO
/****** Object: Database [MultipleLogFiles] Script Date: 5/30/2015 4:54:10 PM ******/
CREATE DATABASE [MultipleLogFiles]
CONTAINMENT = NONE
ON PRIMARY
( NAME = N'MultipleLogFiles', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\MultipleLogFiles.mdf' , SIZE = 5316608KB , MAXSIZE = UNLIMITED, FILEGROWTH = 0)
LOG ON
( NAME = N'MultipleLogFiles_log_1', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\MultipleLogFiles_log_1.ldf' , SIZE = 1048576KB , MAXSIZE = 2048GB , FILEGROWTH = 0),
( NAME = N'MultipleLogFiles_log_2', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\MultipleLogFiles_log_2.ldf' , SIZE = 1048576KB , MAXSIZE = 2048GB , FILEGROWTH = 0),
( NAME = N'MultipleLogFiles_log_3', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\MultipleLogFiles_log_3.ldf' , SIZE = 1048576KB , MAXSIZE = 2048GB , FILEGROWTH = 0),
( NAME = N'MultipleLogFiles_log_4', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\MultipleLogFiles_log_4.ldf' , SIZE = 1048576KB , MAXSIZE = 2048GB , FILEGROWTH = 0),
( NAME = N'MultipleLogFiles_log_5', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\MultipleLogFiles_log_5.ldf' , SIZE = 1048576KB , MAXSIZE = 2048GB , FILEGROWTH = 0)
GO

IF OBJECT_ID('SalesOrderHeader') IS NOT NULL
DROP TABLE [dbo].[SalesOrderHeader]
GO

SET NOCOUNT ON
GO

CREATE TABLE [dbo].[SalesOrderHeader](
[SalesOrderID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
[RevisionNumber] [tinyint] NOT NULL,
[OrderDate] Date NOT NULL,
[DueDate] Date NOT NULL,
[ShipDate] Date NULL,
[Status] [tinyint] NOT NULL,
[OnlineOrderFlag] [bit] NOT NULL,
[SalesOrderNumber] [nvarchar](25) NOT NULL,
[PurchaseOrderNumber] [nvarchar](25) NULL,
[AccountNumber] [nvarchar](15) NULL,
[CustomerID] [int] NOT NULL,
[SalesPersonID] [int] NULL,
[TerritoryID] [int] NULL,
[BillToAddressID] [int] NOT NULL,
[ShipToAddressID] [int] NOT NULL,
[ShipMethodID] [int] NOT NULL,
[CreditCardID] [int] NULL,
[CreditCardApprovalCode] [varchar](15) NULL,
[CurrencyRateID] [int] NULL,
[SubTotal] [money] NOT NULL,
[TaxAmt] [money] NOT NULL,
[Freight] [money] NOT NULL,
[TotalDue] [money] NOT NULL,
[Comment] [nvarchar](128) NULL,
[rowguid] [uniqueidentifier] NOT NULL,
[ModifiedDate] Date NOT NULL
)
GO

Create an Xevent Session to capture checkpoint_begin, checkpoint_end, file_written, file_write_completed, database_log_flush Xevents during the bulk insert operation

IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='MultipleLogFiles')
DROP EVENT SESSION [MultipleLogFiles] ON SERVER;

DECLARE @sqlcmd nvarchar(4000) = '
CREATE EVENT SESSION MultipleLogFiles
ON SERVER
ADD EVENT sqlserver.checkpoint_begin
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlserver.checkpoint_end
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlserver.file_written
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlserver.file_write_completed
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD EVENT sqlserver.databases_log_flush
( WHERE (sqlserver.database_id = '+ cast(DB_ID() as varchar(3))+')),
ADD TARGET package0.asynchronous_file_target(
SET filename=''C:\XEvents\MultipleLogFiles.xel'',
metadatafile=''C:\XEvents\MultipleLogFiles.xem'')
WITH (MAX_MEMORY = 8MB, EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS, TRACK_CAUSALITY = ON, MAX_DISPATCH_LATENCY=5SECONDS)'

EXEC (@sqlcmd)

Start the Xevent, Perform the Bulk Insert & checkpoint to flush the dirty buffers to data files & finally STOP the Xevent Session

ALTER EVENT SESSION MultipleLogFiles
ON SERVER
STATE=START
GO
-- Load data into the test database
INSERT INTO dbo.SalesOrderHeader
SELECT RevisionNumber,
DATEADD(DD, 1126+number, OrderDate),
DATEADD(DD, 1126+number, DueDate),
DATEADD(DD, 1126+number, ShipDate),
soh.Status, OnlineOrderFlag, SalesOrderNumber,
PurchaseOrderNumber, AccountNumber,
CustomerID, SalesPersonID, TerritoryID,
BillToAddressID, ShipToAddressID,
ShipMethodID, CreditCardID, CreditCardApprovalCode,
CurrencyRateID, SubTotal, TaxAmt, Freight,
TotalDue, Comment, rowguid,
DATEADD(DD, 1126+number, ModifiedDate)
FROM AdventureWorks2012.Sales.SalesOrderHeader AS soh
CROSS JOIN master.dbo.spt_values AS sv
WHERE sv.type = N'P'
AND sv.number > 0 AND sv.number < 6
GO 50
-- Flush all dirty pages to disk
CHECKPOINT
GO
-- Stop the Event Session
ALTER EVENT SESSION MultipleLogFiles
ON SERVER
STATE=STOP
GO

Load the Xevent data from xel file into a Table & later extract the XML eventdata into another table called MultipleLogFileResultsParsed for reporting

IF OBJECT_ID('MultipleLogFileResults') IS NOT NULL
DROP TABLE MultipleLogFileResults
GO
IF OBJECT_ID('MultipleLogFileResultsParsed') IS NOT NULL
DROP TABLE MultipleLogFileResultsParsed
GO
-- Create results table to load data from XE files
CREATE TABLE MultipleLogFileResults
(RowID int identity primary key, event_data XML)
GO
-- Load the event data from the file target
INSERT INTO MultipleLogFileResults(event_data)
SELECT
CAST(event_data AS XML) AS event_data
FROM sys.fn_xe_file_target_read_file('C:\XEvents\MultipleLogFiles*.xel',
'C:\XEvents\MultipleLogFiles*.xem',
null, null)
GO
-- Parse the event data
SELECT
RowID,
event_data.value('(event/@name)[1]', 'varchar(50)') AS event_name,
DATEADD(hh,
DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP),
event_data.value('(event/@timestamp)[1]', 'datetime2')) AS [timestamp],
COALESCE(event_data.value('(event/data[@name="database_id"]/value)[1]', 'int'),
event_data.value('(event/action[@name="database_id"]/value)[1]', 'int')) AS database_id,
event_data.value('(event/data[@name="mode"]/text)[1]', 'nvarchar(4000)') AS [mode],
event_data.value('(event/data[@name="file_handle"]/value)[1]', 'nvarchar(4000)') AS [file_handle],
event_data.value('(event/data[@name="offset"]/value)[1]', 'bigint') AS [offset],
event_data.value('(event/data[@name="page_id"]/value)[1]', 'int') AS [page_id],
event_data.value('(event/data[@name="file_id"]/value)[1]', 'int') AS [file_id],
event_data.value('(event/data[@name="file_group_id"]/value)[1]', 'int') AS [file_group_id],
event_data.value('(event/data[@name="wait_type"]/text)[1]', 'nvarchar(100)') AS [wait_type],
event_data.value('(event/data[@name="duration"]/value)[1]', 'bigint') AS [duration],
event_data.value('(event/action[@name="sql_text"]/value)[1]', 'nvarchar(4000)') AS [sql_text],
event_data.value('(event/data[@name="cpu"]/value)[1]', 'int') AS [cpu],
event_data.value('(event/data[@name="reads"]/value)[1]', 'bigint') AS [reads],
event_data.value('(event/data[@name="writes"]/value)[1]', 'bigint') AS [writes],
CAST(SUBSTRING(event_data.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 1, 36) AS uniqueidentifier) as activity_id,
CAST(SUBSTRING(event_data.value('(event/action[@name="attach_activity_id"]/value)[1]', 'varchar(50)'), 38, 10) AS int) as event_sequence
INTO MultipleLogFileResultsParsed
FROM MultipleLogFileResults
ORDER BY Rowid

To analyze the Xevent data, I used Power view which is my favorite tool for Data Analytics & Reporting.

LogWrites

LogWritesfiltered

From the above screenshot we observe the following

  • When the transaction started, it started filling the Transaction Log 5 (in this case file id 6)
  • Next it sequentially started filling Transaction Log 1 file id 2
  • Next it sequentially started filling Transaction Log 2 file id 3
  • Next it sequentially started filling Transaction Log 3 file id 4
  • Lastly we see writes on Data File id 1 since we fired CHECKPOINT at the end of transaction to flush & write all dirty buffers to Data Files

At this point, if we fire DBCC LOGINFO or looks at the size of Tlog, it might appear why  SQL Server skipped filling of  transaction log 4 (file id 5) & filled transaction log 5 (file id 6) instead. In fact after seeing the above data you would understand that when the transaction started the current active VLF was somewhere in the middle of the Transaction log 5 (file id 6) from where it started filling the logs & sequentially moved to 1st tlog file (file id 2) , 2nd tlog file (file id 3) & then 3rd Tlog file (file id 4).

The following is the snapshot of DBCC LOGINFO at the end of transaction.

LogInfo

Hence to answer my friend’s question.

Yes SQL Server writes sequentially into VLFs of single file & then moves to the next file in round robin manner & loops back to first file after the Tlog is truncated.

So why did we observe that both the Tlog were 50% used rather single transaction file 100% file?

This is because in his case, when the transaction started the current active VLF file would be in the middle of Tlog file id 1 & hence when the data flush reached end of file Tlog 1, it sequentially moved to following VLFs in second file & filled 50% on first file & 50 % on second file.

As observed in the output of DBCC LOGINFO earlier, when we have multiple log files, the VLFs are ordered sequentially from 1st log file until the last one, hence if the current active VLF (status 2) is somewhere in between of the Tlog, after reaching the end of 1st file, it moves to the next sequential VLF (FSeqNo in DBCC LOGINFO) which at the start of second Log file & not at the start of 1st Logfile which is why both the Log files are partially filled.

Further after the above snapshot of DBCC LOGINFO, when we perform next few transactions, the current active VLF is 92… after 92 is filled, SQL Server moves to the next VLF & renames the FSeqNo of VLF from 53 to 93, status changes to 2 & parity changes to 128.

Loginfo2

Hope this explains the sequential writing of Transaction Log

Parikshit Savjani
Premier Field Engineer