Columnstore Indexes and Statistics


Until the introduction of Columnstore indexes, most DBAs didn’t care much about the creation of statistics on the indexed columns as the statistics are auto created on the indexed columns when the index is created. With Auto Create Statistics turned ON for the database, the optimizer will automatically trigger the creation of statistics when it detects a missing statistics which further doesn’t require any manual intervention to create statistics.

Statistics are not created automatically when you create a columnstore index. This may not be of a big concern since when you query the base table or index for the first time, the optimizer will trigger the creation of statistics when auto create statistics is turned ON. However this behavior can increase the response time of the first query to hit the columnstore index since until the statistics are created & populated, the plan cannot be created & executed to bring the result set.

Consider a scenario which may be very common with columnstore indexes. You create a Fact table with clustered columnstore index & bulk load millions of records into the table. As the data gets loaded, the tuple movers will ensure to create compressed row groups of the clustered columnstore index. Now when the first query is issued against the fact table, the optimizer spends time in creating the statistics on this large fact table by scanning millions of records which can take up considerable amount of time depending on the resources available on the server. It can easily take up to few mins to create statistics on a large fact table with billions of records on the table on an idle system. If you were doing a test to compare the performance columnstore index v/s rowstore index, you might easily assume that columnstore index is running slow since your rowstore index might complete the query in 5-10 mins while the first query against columnstore index took so long. Further if you end up cancelling the query since it is taking too long, the statistics never get created & each time you issue the query, the query appears to take long.

It is important to understand & be familiar with this behavior of the columnstore indexes. You may not observe the difference if the table is small but if the data is huge, the time to create the statistics is considerable which can amplify the response time of the first query against the index.

Hope this helps !!!

Parikshit Savjani
Sr. Premier Field Engineer

Leave a Reply

Required fields are marked *.