Remote Blob Storage (RBS) client library setup requirements in SQL Server 2016

Cross post from https://blogs.msdn.microsoft.com/sql_server_team/remote-blob-storage-rbs-client-library-setup-requirements-in-sql-server-2016/

Remote BLOB Store (RBS) is a client library with SQL Server which allows developers to store, access, retrieve binary large objects (BLOBs) outside SQL Server database files while still maintaining the ACID properties and transactional consistency of data in Blob store. RBS allows you to efficiently utilize storage and IO resources by managing both structured and unstructured data together in a SQL Server database with structured data stored in SQL Server datafiles while unstructured data stored outside SQL Server on a commodity storage solution. RBS client library exposes a set of APIs for developers to access, modify and retrieve blobs from the blob store. Each BLOB store has its own provider library, which plugs into the RBS client library and specifies how BLOBs are stored and accessed along with SQL Server database.

SQL Server FILESTREAM feature allows you to store and manage binary data (varbinary(max)) in SQL Server utilizing the underlying NTFS storage as BLOB store.

RBS FILESTREAM provider is one such free out-of-box provider which plugs into RBS client library to allow a deployment to use FILESTREAM enabled SQL Server Database as a dedicated BLOB store for application. The RBS FILESTREAM provider utilizes the FILESTREAM feature in SQL Server for BLOB storage and ties the two technologies together. SharePoint is one such application which allows you to use RBS FILESTREAM provider on Web Front End Servers and SQL Server FILESTREAM feature for storing and managing BLOBs on NTFS storage outside SharePoint content database as documented in MSDN article here.

A number of third-party storage solution vendors have developed RBS providers that conform to these standard APIs and support BLOB storage on various storage platforms.

For more details on RBS in SQL Server you can refer to the MSDN article here.

RBS 2016 client libraries are shipped as part of SQL Server 2016 Feature pack. As a pre-requisite, RBS requires SQL Server database for storing blob metadata along with the Blob store. To connect to SQL Server, RBS requires at least ODBC driver version 11 for SQL Server 2014 and ODBC Driver version 13 for SQL Server 2016. Drivers are available at Download ODBC Driver for SQL Server. If RBS is installed on the same server as SQL Server 2016 instance, ODBC driver is already installed as part of the SQL Server installation. However when RBS is installed on separate client server like SharePoint WFE in a multi-server farm setup, ODBC driver 13.0 is not installed on the client server and needs to be installed separately as a pre-requisite for installing RBS client library.

If Microsoft ODBC Driver 13.0 for SQL Server is missing on the client server, the setup may fail with the following error when you try to hit test connection on the Database connection screen

clip_image002

If you are installing using command prompt, the output log file will show the following error,

MSI (s) (CC:FC) [15:12:55:265]: Note: 1: 1723 2: InstallCounters 3: CreateCounters 4: C:\Windows\Installer\MSI8C86.tmp CustomAction InstallCounters returned actual error code 1157 (note this may not be 100% accurate if translation happened inside sandbox)
MSI (s) (CC:FC) [15:12:55:265]: Product: Microsoft SQL Server 2016 Remote BLOB Store — Error 1723. There is a problem with this Windows Installer package. A DLL required for this installation to complete could not be run. Contact your support personnel or package vendor. Action InstallCounters, entry: CreateCounters, library: C:\Windows\Installer\MSI8C86.tmp
MSI (s) (CC:FC) [15:12:55:265]: Creating MSIHANDLE (141) of type 790531 for
thread 30204 Error 1723. There is a problem with this Windows Installer package. A DLL required for this installation to complete could not be run. Contact your support personnel or package vendor. Action InstallCounters, entry: CreateCounters, library: C:\Windows\Installer\MSI8C86.tmp

Hence, as a pre-requisite for installing RBS on the client server, it is important to install Microsoft ODBC Driver 13.0 for SQL Server or higher version on the client to avoid the above error while running the RBS.msi setup for SQL 2016.

Parikshit Savjani
Senior Program Manager (@talktosavjani)

Boosting Update Statistics performance with SQL 2014 SP1/CU6

Cross post from https://blogs.msdn.microsoft.com/sql_server_team/boosting-update-statistics-performance-with-sql-2014-sp1cu6/

With SQL 2005 end of life, many customers are planning their upgrades to SQL 2014. One of the recommended post upgrade tasks is updating the statistics of all the tables and databases in latest compatibility level. The updated statistics ensures the new optimizer in the latest release has most up to date data to produce the best plans. For VLDB scenarios, running update statistics on the entire database can take considerable time to complete depending on the volume of data. As applications are required to run 24/7 with desired performance, DBAs are increasingly under pressure to keep the maintenance window as low as possible. One such improvement introduced in SQL 2014 SP1 CU6  targeted towards optimizing and reducing execution times for UPDATE STATISTICS runs is detailed in this blog post.

In scenarios where the execution time of update statistics job is critical, the execution time can be improved by running update statistics for tables in the database in parallel.

Let me illustrate this with an example. Consider a hypothetical scenario where a database has 4 tables viz T1, T2, T3 and T4. Table T1 has 8 statistics in total combining the index and column statistics. Table T2 has 4 statistics, table T3 has 2 statistics and table T4 has 2 statistics. Let’s say updating each statistics takes 1 sec each.

In this case, if the statistics is updated serially using a single job, the total time taken to update the statistics for all the tables in the database is (8+4+2+2) = 16 seconds as shown in the figure below

 

image

If the statistics is updated in parallel using 4 parallel jobs (One job per table), the overall statistics gets updated for the entire database in 8 seconds as the shown in the figure below

 

image

Like any parallel algorithm, you can gain maximum throughput when there is even distribution of workloads across all the execution units. When executing update statistics in parallel like previous example, there is a possibility one table has many statistics and large number of records as table T1 in this case which is refreshed by the same thread or job. The worker thread or job running the update statistics on this table is likely to delay the overall execution while the other threads are idle having completed their workload. This uneven distribution of workload can reduce the overall efficiency and throughput of the UPDATE STATISTICS run. The parallel execution of UPDATE STATISTICS job can be improved by running the update statistics on individual statistics in parallel rather than table to distribute the workload evenly across threads.

In the above scenario, if the update statistics is run in parallel on individual statistics, the statistics can be completed in 4 seconds as shown in the figure below

 

image

Until SQL 2014 SP1 CU6, if you try to run the update statistics job in parallel on individual statistics as shown above, the parallel threads updating the statistics on the same table will be blocked due to the (exclusive) X LOCK taken by the first update statistics job on the UPDSTATS metadata resource of the table as shown below. Due to the blocking, the overall execution time of the UPDATE STATISTICS can be higher as shown below.

 

 

image

With SQL 2014 SP1 CU6, a new Trace Flag 7471 has been introduced to address this blocking scenario. Trace Flag 7471 is documented in the KB 3156157. Trace Flag 7471 changes the locking behavior of update statistics command such that SQL Server engine no longer acquires X LOCK on UPDSTATS resource on the table. Rather the engine acquires an (update) U LOCK on each individual statistics object being updated which doesn’t block other parallel threads on the same table running on different statistics. This behavior improves the overall concurrency of the update statistics job run executing in parallel and improves its performance.

 

image

Our empirical testing shows TF 7471 can increase the possibility of deadlock especially when creation of new statistics and updating of existing statistics are executed simultaneously. So the recommended best practice is to use TF 7471 only as a part of maintenance when update statistics jobs are run in parallel and should be turned off otherwise.

To conclude, starting SQL 2014 SP1 CU6, the performance and execution times of update statistics can be improved by running the job in parallel on individual statistics and enabling Trace flag 7471.

Parikshit Savjani (@talktosavjani)
Senior Program Manager

SQL Server Datafile maxsize reported incorrectly

Recently we encountered and filed a benign reporting BUG which reports the maxsize of the data file incorrectly. If you create a database & then alter the initial size of the database to be greater than maxsize & then query sys.database_files, sys.master_files, sysaltfiles the maxsize appears to be incorrectly reporting the older value of maxsize. However the maxsize is actually set to the new size within the database which can be verified by reading the File header data, it’s just reported incorrectly in the DMVs.

Further restarting the sql instance or taking the database offline & then online fixes the issue as the dmvs gets synchronized with file header at the database startup

The issue occurs only in SQL 2012 & SQL 2014 while it appears to be fixed in SQL 2016 RC0 & doesn’t occur in SQL 2008 R2 & below.

Following is the steps to repro the issue

REPRO

IF EXISTS (
SELECT *
FROM sys.databases
WHERE name = N'maxsize'
)
DROP DATABASE maxsize
GO

CREATE DATABASE maxsize
ON PRIMARY
(NAME = maxsize_data,
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Data\maxsize_data.mdf',
SIZE = 64MB,
MAXSIZE = 1024MB,
FILEGROWTH = 10%)
LOG ON
( NAME = maxsize_log,
FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\Data\maxsize_log.ldf',
SIZE = 10MB,
MAXSIZE = 50MB,
FILEGROWTH = 10%)

GO

USE [master]
GO
ALTER DATABASE [maxsize] MODIFY FILE ( NAME = N'maxsize_data', SIZE = 4194304KB )
GO

USE [maxsize]
GO
select * from sys.database_files
GO
select * from sys.sysaltfiles where dbid = DB_ID()
GO
Select * from sys.master_files where database_id = DB_ID()
GO

/*
SQL 2012\SQL 2014
size = 524288 maxsize = 131072
*/

--Taking the database offline & bring it back online

alter database [maxsize] set offline
GO
alter database [maxsize] set online
GO

/* Reports accurately */
use maxsize
GO
select * from sys.database_files
GO
select * from sys.sysaltfiles where dbid = DB_ID()
GO
Select * from sys.master_files where database_id = DB_ID()
GO

/*
SQL 2012\SQL 2014
size = 524288 maxsize = 52488
*/

USE MASTER
GO
DROP DATABASE [maxsize]
GO

Like a mentioned before, it is just a reporting bug & shouldn’t do any harm but just in case you see maxsize value greater than the size column field of data files don’t be surprised :)

Hope this helps !!!

Parikshit Savjani
Sr. Premier Field Engineer

Columnstore Indexes and Statistics

Until the introduction of Columnstore indexes, most DBAs didn’t care much about the creation of statistics on the indexed columns as the statistics are auto created on the indexed columns when the index is created. With Auto Create Statistics turned ON for the database, the optimizer will automatically trigger the creation of statistics when it detects a missing statistics which further doesn’t require any manual intervention to create statistics.

Statistics are not created automatically when you create a columnstore index. This may not be of a big concern since when you query the base table or index for the first time, the optimizer will trigger the creation of statistics when auto create statistics is turned ON. However this behavior can increase the response time of the first query to hit the columnstore index since until the statistics are created & populated, the plan cannot be created & executed to bring the result set.

Consider a scenario which may be very common with columnstore indexes. You create a Fact table with clustered columnstore index & bulk load millions of records into the table. As the data gets loaded, the tuple movers will ensure to create compressed row groups of the clustered columnstore index. Now when the first query is issued against the fact table, the optimizer spends time in creating the statistics on this large fact table by scanning millions of records which can take up considerable amount of time depending on the resources available on the server. It can easily take up to few mins to create statistics on a large fact table with billions of records on the table on an idle system. If you were doing a test to compare the performance columnstore index v/s rowstore index, you might easily assume that columnstore index is running slow since your rowstore index might complete the query in 5-10 mins while the first query against columnstore index took so long. Further if you end up cancelling the query since it is taking too long, the statistics never get created & each time you issue the query, the query appears to take long.

It is important to understand & be familiar with this behavior of the columnstore indexes. You may not observe the difference if the table is small but if the data is huge, the time to create the statistics is considerable which can amplify the response time of the first query against the index.

Hope this helps !!!

Parikshit Savjani
Sr. Premier Field Engineer

FUN – Self-Assessment of your social presence in fb using Power BI

If you read the title of this post & were wondering, What the hell is that ? , Trust me I have no clue how I made this up but I felt it would be an appropriate title when someone wants to perform data analysis on their fb data to understand how much active they are on fb YoY or month-on-month, who are your good friends, who like your posts the most & discover some long lost friends who were your good friends earlier but have lost touch.

Assumption: The modern definition of good friend is the friend who likes your all your post or activity in Facebook ;)

One might feel data analysis of your fb activity may be completely useless task until you discover some insights about your life or persona changes reflected in your fb activity. Data Analysis is like a treasure hunt where all your efforts are useless until you hit the jackpot. In this blog I would reflect some of the insights which I discovered after analyzing my fb data using Power BI and relating it to some of the changes in my life.

Warning: The following data analysis might be a complete waste of your time and highly non-productive. Any accidental learning from this post might be completely coincidental & at your own risk.

Here is the PowerBI dashboard of my Facebook activity
Fbdashboard

Lets dig into some insights which I discovered from the dashboard

  • So I have posted 203 posts from 2009 till date with around 2835 likes which averages to around 14 likes per post which is not that bad. At any given point in time I have atleast 14 friends who likes my activity on fb.
  • I have got the most number of likes (210 & 135) when my books got published and my friends really liked me for getting published which means I should continuing doing it :)
  • I was most active in Facebook in the year 2012 but after that my activity dropped significantly in the year 2013 which is the year I got married ( No further insights might be warranted on this as it is self-explanatory).
  • The Word cloud which is the new custom visual available in Power BI gallery emphasizes & amplifies some of the good friends I have who have liked my posts & activity on fb. If I slice the data for the year 2013, the word cloud looks as shown below
    Year 2013 is the only time when my wife Charmi has liked my fb posts the most and then we got married & rest is history. So here is the data I have that I can share with my wife that post marriage Husbands are no longer appreciated any more. On the other side, Can husbands afford to not like any posts from their wives ???

WordCloud

On the other hand, When I sliced my data for the year 2009, I found few friends who used to like me very much but have lost touch somewhere so I am planning to connect with them again while when I slice the data for the year 2015, I have found some new friends in there.

  • Another interesting insight which I discovered from the type of my posts in FB is, until the year 2013 majority of my posts are status updates but off late I don’t like sharing much of the status updates & occasionally I just log in facebook and re-share videos or link I found interesting. I guess that has something to do with my maturity where in earlier, I just posted anything or everything which I did as status updates but slowly I don’t like doing status update as much.

Postsbytype

  • Finally I discovered that I am most active on FB in the month of December which may be due to the Holidays and least active in the month of Jun & Nov which are the times when I like to take vacations since June is our anniversary while Nov is my wife’s birthday.

PostsbyMonth

If anyone is interested to reuse my Power BI dashboard, Please little r me or message me and I will be happy to share the *.pbix file for your self-assessment.

Hope this helps !!!

Parikshit Savjani
Sr. Premier Field Engineer

My book on SQL on Azure succintlly released

I feel extremely excited to share the latest release of my book on SQL on Azure Succintly with Syncfusion publications which talks about the deployments of SQL Server to Azure.

You can download the latest version of the ebook from the following link
http://www.syncfusion.com/resources/techportal/details/ebooks/sqlazure

In recent years, we have witnessed how information technology has contributed to the growth and reach of businesses around the globe. Companies which are primarily technology companies, like Amazon, Facebook, and Uber, have built their businesses solely on the basis of information technology and have grown unimaginably. In today’s world we cannot imagine a business without a website, app, or web presence, but does having a website, app, or web presence guarantee a business will be successful? Until a business is successful, companies aren’t sure whether their investment in information technology and IT infrastructure is fruitful or not.

Setting up and maintaining a website, app, database, or data warehouse incurs a lot of capital and operational expenses (CAPEX and OPEX) for a company whose core business may not be IT. As a result, small and medium business around the world are looking towards cloud solution providers to offer IT as a service and pay only as they use with elastic scale up-down flexibility. Even large enterprises who can afford their own data centers and infrastructure are looking towards cloud providers since data is growing at a rapid pace and investing in new hardware and infrastructure for new projects, development, and testing may not make sense until the project becomes critical and revenue generating.

Cloud services allow big enterprises to innovate and research at an optimized cost without investing upfront on the infrastructure in their data centers. Further, when products are designed to run as a service, the product release cycle can be agile and fast. All of these benefits make cloud services an attractive offering and optimal alternative for businesses today.

In this book, I introduce you to Microsoft SQL Server (RDBMS) Offerings on Azure and talk about the use cases, considerations, configurations, and design optimizations for running SQL Server on Azure to give you predictable performance at optimized cost.

This book is primarily focused on instructing individuals with prior knowledge of SQL Server how to manage, optimize, and design applications running SQL Server on Azure.

Disabling Non-Clustered Columnstore Index doesn’t appear to execute in Stored Procedure in SQL 2012

Consider the following scenario, you have a large fact table with non-clustered columnstore index. In SQL 2012, Columnstore index is not updatable hence at night when you load or purge the table, the first step would be to disable non-clustered columnstore index followed by insert/update/delete to the table and the final step would be to rebuild the non-clustered columnstore index to ensure the table is available for querying & reporting on the following day to give you the desired reporting performance.

The Stored Procedure which is used to load or purge the table would be of the form as shown

CREATE PROCEDURE dbo.usp_Update_Fact_table

AS

BEGIN

AlterIndex [csindex_wk_fact] ON [dbo].[Wk_Fact] disable;– Disable Columnstore Index before Update

UPDATE [dbo].[Wk_fact] – Update Fact Table

Alterindex csindex_wk_fact  ON dbo.Wk_Fact rebuild;– Rebuild Index After Index

END

When you execute the above stored procedure, it fails with following error

UPDATE statement failed because data cannot be updated in a table with a columnstore index. Consider disabling the columnstore index before issuing the UPDATE statement, then rebuilding the columnstore index after UPDATE is complete.

The above error message might appear that ALTER INDEX… DISABLE statement is skipped during the execution of stored procedure and the stored procedure throws the above error while executing the UPDATE query. But the error message doesn’t occur during execution, it actually occurs during compilation of the stored procedure.

When the stored procedure is being compiled, it checks that the Fact table has non-clustered columnstore index which cannot be modified and hence throws the above error at the compile time. This can be further verified by trying to get the estimated showplan of the stored procedure which fails with the same error as above.

SET SHOWPLAN_XML ON
EXEC dbo.usp_Update_Fact_table

The Query Optimizer detects the Non-Clustered Columnstore index during the Query Optimization phase of the Query Plan generation and is an optimization to prevent plan generation to save some resources required to compile, execute and cache the query plan which might ultimately error out in the execution phase.

To overcome the above error, you have following options

  1. Run ALTER INDEX….disable outside stored procedure prior to the insert/update/del stored procedure as a first step of the Batch Load/purge job.
  2. If you need to run the disable index as part of same stored procedure, you can execute the DML to Fact table as dynamicSQL which will avoid the compile time check as shown below

    CREATE PROCEDURE dbo.usp_Update_Fact_table
    AS
    BEGIN

    Alter Index [csindex_wk_fact] ON [dbo].[Wk_Fact] disable;– Disable Columnstore Index before UpdateEXEC sp_executesql  ‘UPDATE [dbo].[Wk_fact]‘;– Update Fact Table as Dynamic SQL

    Alter index csindex_wk_fact  ON dbo.Wk_Fact rebuild;– Rebuild Index After Index

    END

  3. Upgrade to SQL 2014 or above as the columnstore Index is updatable in SQL 2014 and above version where these compile checks do not occur.

Hope this helps !!!

Parikshit Savjani
Senior Premier Field Engineer

Added Auto Indexing & Indexing Mode parameters in Add-DocumentCollection Cmdlet

Couple of weeks back I released Powershell cmdlets for DocumentDB in Github (Further info on this can be found in my blog here). Today I merged & added  Automatic Indexing & IndexingMode parameters to the Add-DocumentCollection Cmdlet to the project in the module.

https://github.com/savjani/Azure-DocumentDB-Powershell-Cmdlets

Azure DocumentDB collections are auto-indexed by default allowing the developers  to focus on the application development & not worry about the schema or index definition for the database. Automatic indexing of documents is enabled by consistent, write optimized, lock free, and log structured index maintenance techniques.

Anyone who has worked on databases would know, while indexes are useful to expedite querying, it is an additional overhead to write, maintain & store indexes. Hence it is important & efficient that we use indexing wisely & do not index attributes, which may never be used for querying or filtering.

Secondly DocumentDB supports two modes for Indexing viz Consistent (synchronous mode) v/s Lazy (asynchronous mode). If you are someone like me who has a database background, you would know that for continuous transactional OLTP workload we prefer to have Consistent indexing wherein the indexes are updated & maintained as data is inserted, updated or deleted. However for batch Datawarehousing type workloads we prefer to drop the index, load the data & recreate the index which will ensure to give us the best write performance. Similarly in DocumentDB if we intend to perform bulk load of the documents the write performance would be improved with Indexing Mode set to Lazy as compared to synchronous consistent mode.

At the time of writing this blog, AutoIndexing & IndexingMode settings are only supported during creation of collections & cannot be altered after collection is created. hence it is important that we plan & choose the right setting for our collection while creating it.

To support these settings during creating of collection using Powershell, I have added the following Optional parameters in Add-DocumentCollection Cmdlet.

AutoIndexing – ($true or $False) : This parameter is used to specify whether we want Automatic Indexing to be turned ON or OFF in the collection. It is optional parameter & if omitted, Automatic Indexing is turned ON by default.

IndexingMode – (Consistent or Lazy): This parameter is used to specify the IndexingMode for the collection. It is optional parameter  & if omitted it is set to Consistent by default.

A good reference for DocumentDB Indexing Policy is following article from Product Group

https://azure.microsoft.com/en-us/documentation/articles/documentdb-indexing-policies/

Usage:

Creating a New Database & New Document Collection with default AutoIndexing & IndexingMode

Import-Module ‘C:\Users\pariks\Documents\GitHub\Azure-DocumentDB-Powershell-Cmdlets\Azrdocdb\Azrdocdb1\bin\Debug\Azrdocdb.dll’
$ctx = New-Context -Uri <uri> -Key <key>
$db = Add-Database -Context $ctx -Name “DocDBPS2″
$coll = Add-DocumentCollection -Context $ctx -DatabaseLink $db.SelfLink -Name “DocCollPS”

Get-database -Context $ctx -SelfLink $db.SelfLink | ft

Creating a New Database & New Document Collection & Setting Automatic Indexing Policy

Import-Module ‘C:\Users\pariks\Documents\GitHub\Azure-DocumentDB-Powershell-Cmdlets\Azrdocdb\Azrdocdb1\bin\Debug\Azrdocdb.dll’
$ctx = New-Context -Uri <uri> -Key <key>
$db = Add-Database -Context $ctx -Name ‘DocDB’
$coll = Add-DocumentCollection -Context $ctx -DatabaseLink $db.SelfLink -Name ‘DocCollPS’ -AutoIndexing $true
Get-database -Context $ctx -SelfLink $db.SelfLink | ft

Creating a New Database & New Document Collection & Setting Automatic Indexing Policy & IndexingMode

Import-Module ‘C:\Users\pariks\Documents\GitHub\Azure-DocumentDB-Powershell-Cmdlets\Azrdocdb\Azrdocdb1\bin\Debug\Azrdocdb.dll’
$ctx = New-Context -Uri <uri> -Key <key>
$db = Add-Database -Context $ctx -Name ‘DocDB’
$coll = Add-DocumentCollection -Context $ctx -DatabaseLink $db.SelfLink -Name ‘DocCollPS’ -AutoIndexing $true -IndexingMode ‘Lazy’
Get-database -Context $ctx -SelfLink $db.SelfLink | ft

Hope this Helps Smile

Parikshit Savjani
Premier Field Engineer

Releasing Powershell Cmdlets for Azure DocumentDB in Github

Azure DocumentDB is Microsoft’s new budding NoSQL Schema Free database offering which is relatively a recent addition to the mammoth Microsoft Data Platform Stack.

Azure DocumentDB provides great flexibility to the Developer world who feels a lot restricted with RDBMS data model, adhering to the data types, schema, index designs & patterns. Azure DocumentDB gives flexibility to the Developers to store entire records or JSON documents in Azure DocDB database uniquely identified by a key-value pair. All the attributes within a document are indexed by default with range or hash indexes thereby expediting the query processing against the documents & improving the query performance for reporting & analytics. Hence Developer just needs to serialize the data structures into JSON format & store them in Azure DocumentDB database while Azure DocumentDB takes care of indexing & storage of the attributes. Azure DocumentDB supports querying the attributes using the familiar SQL language over JSON structure making it compatible with most reporting & analytics platform.

While I started learning & evaluating Azure DocumentDB, I found one of component which was missing in Azure DocumentDB is out of box Powershell cmdlets for managing & administering the database. I found this was a opportunity for me to kick start a new community project in Github & try to get my hands dirty with Github public repository which I was aspiring to learn & adopt for quite sometime now.

With this excitement & modesty, I have created a project in Azure DocumentDB for Powershell cmdlets in Github to perform some basic management operation in Azure DocumentDB

https://github.com/savjani/Azure-DocumentDB-Powershell-Cmdlets

Further I invite the community members to further extend this modest list of Cmdlets to a full blown DocDB management & administration module similar to SQLPS

Powershell Cmdlet Definition & its Usage

Creating a New Database

Import-Module C:\Users\pariks\Documents\WindowsPowerShell\Modules\DocDB\Azrdocdb.dll
$ctx = New-Context -Uri <uri> -Key <key>
$db = Add-Database -Context $ctx -Name “DocDBPS2″
Get-database -Context $ctx -SelfLink $db.SelfLink | ft

Creating a New Database & New Document Collection

Import-Module C:\Users\pariks\Documents\WindowsPowerShell\Modules\DocDB\Azrdocdb.dll
$ctx = New-Context -Uri <uri> -Key <key>
$db = Add-Database -Context $ctx -Name “DocDBPS2″
$coll = Add-DocumentCollection -Context $ctx -DatabaseLink $db.SelfLink -Name “DocCollPS”
Get-database -Context $ctx -SelfLink $db.SelfLink | ft

List all the Databases in the Given Azure DocumentDB Database Account

Import-Module C:\Users\pariks\Documents\WindowsPowerShell\Modules\DocDB\Azrdocdb.dll
$ctx = New-Context -Uri <uri> -Key <key>
Get-databases -Context $ctx | ft

List Database Properties for a given Database

Import-Module C:\Users\pariks\Documents\WindowsPowerShell\Modules\DocDB\Azrdocdb.dll
$ctx = New-Context -Uri <uri> -Key <key>
$db = Get-databases -Context $ctx | Where-Object {$_.Id -eq “DocDBPS”}
Get-database -Context $ctx -SelfLink $db.SelfLink | ft

Drop Database

Import-Module C:\Users\pariks\Documents\WindowsPowerShell\Modules\DocDB\Azrdocdb.dll
$ctx = New-Context -Uri <uri> -Key <key>
$db = Get-databases -Context $ctx | Where-Object {$_.Id -eq “DocDBPS”}
remove-database -Context $ctx -SelfLink $db.SelfLink

Get Database Account Consistency Level

Import-Module C:\Users\pariks\Documents\WindowsPowerShell\Modules\DocDB\Azrdocdb.dll
$ctx = New-Context -Uri <uri> -Key <key>
Get-DatabaseAccountConsistencyLevel -Context $ctx

Set Database Account Consistency Level

Import-Module C:\Users\pariks\Documents\WindowsPowerShell\Modules\DocDB\Azrdocdb.dll
$ctx = New-Context -Uri <uri> -Key <key>
Set-DatabaseAccountConsistencyLevel -Context $ctx -DefaultConsistencyLevel Eventual
Get-DatabaseAccountConsistencyLevel -Context $ctx

Hope this helps :)

Parikshit Savjani
Premier Field Engineer

Ungrouping Column using DAX in PowerPivot or SSAS Tabular

To start this blog post, let me first explain the problem. Consider a scenario where you receive an excel sheet or data in the following format

Category SubCategory
Bikes Mountain Bikes
Road Bikes
Touring Bikes
Components Handlebars
Bottom Brackets
Brakes
Chains
Cranksets
Derailleurs
Forks
Headsets
Mountain Frames
Pedals
Road Frames
Saddles
Touring Frames
Wheels
Clothing Bib-Shorts
Caps
Gloves
Jerseys

In the above case, the data is already grouped by Category. If you try to load this data in PowerPivot Data Model & try to create a Power view Column Chart report on Category, the view would look similar to shown below

image

Most of the data is grouped in Blank Category since the Power Pivot Data Model doesn’t know or understand that data was already grouped by Category.

To ungroup the data back in PowerPivot Data Model, I would first add an Index column to the existing data which can be easily added when you are loading the data using Power Query. So the Power Pivot Data Model now looks as shown below

image

Next, we add Calculated Column viz MaxIndex to calculate & store the index value of last non blank value of the Category for each row. The DAX calculation is shown below

=MAXX(FILTER(FILTER(Table1,LEN(Table1[Category])>0),Table1[Index]<=EARLIER(Table1[Index])),Table1[Index])

The output of the above calculation looks like following

image

The above table is now like a Parent-Child Hierarchy where MaxIndex is parent id. So to ungroup the Category column, we can now use LOOKUPVALUE Function as shown below

=LOOKUPVALUE([Category],[Index],[MaxIndex])

The output of the above calculation is as shown below

image

We can now hide the Category, MaxIndex & Index Columns from Client tools to make it invisible from Power view. Now if you visualize the same data in Column chart in Power view, the result is what we are looking for

image

Hope this helps !!!

Parikshit Savjani
Sr. Premier Field Engineer