What is your role in Cloud?

As I interact with more & more customers who are onboarding on Azure which is the Microsoft Cloud solution, one of the common myth or assumptions which I hear or encounter is

 ”We just need to deploy the application or database in cloud &  Microsoft will take care of the OS, Storage, patching etc”

“After applications move to cloud,  we no longer need Infrastructure team or DBAs to manage the platform”

The above statements may hold true to some extent when organizations adopt Cloud in Software as a Service (SaaS) or Platform as a Service (PaaS) scenarios. However in Infastructure As a Service (IaaS)  & in Hybrid Cloud scenarios the above statements  are not completely true & In this blog post, I would like to (re)define the roles & responsibilities in an enterprise as they start their journey towards cloud.

In my opinion, Most Enterprise Organization will not be able to completely migrate to Public Cloud Infrastructure due to multiple reasons so hybrid scenarios would be the most common cloud adoption which is where Microsoft has a clear edge over its competition.

These roles & responsibilities are my personal imagination of future roles based on my experience on the field

But before I start, a disclaimer

“This is a personal weblog. The opinions expressed here represent my own and not those of my employer. In addition, my thoughts and opinions change from time to time…I consider this a necessary consequence of having an open mind. This weblog is intended to provide a semi-permanent point in time snapshot and manifestation of the various memes running around my brain, and as such any thoughts and opinions expressed within out-of-date posts may not the same, nor even similar, to those I may hold today. “

System Administrators

In Cloud scenarios, System Administrator would be responsible for following

  • Building VM Images which adheres to the company standards which can leveraged by the DBAs or other application team to spin up & start within few mins.  It is one time effort which can be reused by DBAs or application team to spin up their Servers within few mins.
  • Update the VM Images with the latest patches & anti-virus updates.
  • Using or configuring Agents (runbooks, Chef & Puppet Add-ins)  to update & patch the Server VMs already in use in the organization’s environment.
  • Monitor System performance & Resource Utilization

Network Administrators

In Hybrid Cloud Scenarios, this will be the most important role since an enterprise cloud should be designed as an extension of on-premise datacenters which will be made possible by Network Administrators.

  • Configure site-to-site VPN to ensure seamless connectivity from on-premise to cloud data centers.
  • Configure IP Sec, Network Security & Firewalls for the cloud data centers.
  • Configure DNS Servers & assign IP address to the VMs running on cloud.
  • Monitor Network Performance & optimize performance over Internet by using network compression techniques or other options to get the max out of the bandwidth.

Domain Administrators

In Hybrid Cloud scenarios, the servers needs to be added to a domain which will allow the users, groups & service account to secure & access the resources based on the policies governed by Domain Administrators

  • Configure Azure Directory Sync to extend the on-premise AD & Exchange to Azure
  • Ensure the Group Policies are applied to the VM Images
  • Ensure Same Organizational account is integrated & used by users to login VMs, Office 365 or databases in cloud.

 Storage Administrators

In Cloud scenarios, this role can be thinned a bit & can be merged with System Administrators role nevertheless this role can have the following responsibilities

  • Configure Storage Pools to combine multiple Data Disks to obtain higher throughput & IOPS.
  • Configure Azure Storage for backups & BLOB storage.
  • Manage & govern storage to decide which storage should be used to achieve the desired throughput within a given cost budget
  • Monitor Storage performance

 Security Administrators

In Cloud scenarios, data security is a big concern which is where this role will be important & will have the following responsibilities

  • Based on the sensitivity of data define data classification layers to govern & decide which data can move to public cloud & which should remain within on-premise.
  • Harden the Network Security & Firewalls.
  • Govern & define policies on which data can be geographically replicated & which cannot for political reasons

 Database Administrators (My Favorite)

In IaaS Cloud Scenarios, the DBA role is almost unchanged & will have the similar responsibilities as on-premise

  • Setup SQL Instance, configure tempdb, create logins, install database, configure user roles.
  • Configure Server Properties .
  • Backups & restore of databases.
  • Monitor SQL Performance & identify blocking or bad performing queries.

In PaaS Cloud Scenarios

  • Configure database, create users, roles & assign permissions.
  • Backup & Restore Databases.
  • Manage SQL Performance & identify blocking or bad performing queries.

One of the skillsets which all the roles would need to possess is Powershell & leveraging Azure Automation to create orchestrator runbooks which will empower the roles to automate some of repetitive Jobs when managing large scale enterprise cloud environment.

In some organizations the above defined roles might overlap or merge across different teams while in large complex enterprise organization they might be isolated & very well defined.

Hope this helps !!!

Parikshit Savjani
Premier Field Engineer

Cloud Revolution & our adoption

This is my first attempt of blogging a non-technical topic (non-technical but still related to technology) since I never felt it is my cup of tea. However the recent changes & developments which I am witnessing around me has provoked a lot of new thoughts in me which I wanted to pen down for my own reference & for the benefit of others who think alike.

But before I start, a disclaimer

“This is a personal weblog. The opinions expressed here represent my own and not those of my employer. In addition, my thoughts and opinions change from time to time…I consider this a necessary consequence of having an open mind. This weblog is intended to provide a semi-permanent point in time snapshot and manifestation of the various memes running around my brain, and as such any thoughts and opinions expressed within out-of-date posts may not the same, nor even similar, to those I may hold today. “

Since past couple of years or even more we have been hearing a lot of buzz on cloud & there is cloud all around us. The more important fact to realize is, cloud is no longer a buzz but it is now a Reality & if we as IT pros ignore this fact we are going to miss out the boat. Huge Tech companies have made large investments to make it a reality now & businesses have even started to realize the cost savings & benefits out of cloud.

While there is enough proof & content out there to prove on how business can benefit from cloud, I like the hotel analogy which I can relate easily. Being consultants or tourists, we like to travel places but do we have to buy real estate every place we visit to stay. Thanks to the hotels we just book the room, checkin at the desired time & check out when u leave & pay for the days you have spent there. Just imagine, how limited our travel or our reach would have been if hotels didn’t exist & we were supposed to buy a room every place we visit.

Similarly, the business can now think of growing their reach by adopting to cloud & not worrying about the cost of data center, real estate, server, RAM, CPUs, storage etc. Businesses can now afford to experiment or innovate even more with IT since the cost of spinning a VMs in cloud is negligible & if they fail they aren’t going to lose much.

The most strong argument which goes against the cloud is data security.  However I am assuming there would have been a same argument when internet was evolving, sending data packets outside your organization was it safe?  But today, Can organizations grow or even survive without their presence in internet?  There are & will be processes, protocols developed for data security in cloud.

However are IT Pros ready for this Cloud Revolution. We are so much in love with our on-premise technologies & that is our comfort zone which has improved & matured over a period of time to become more & more perfect & easy to work with. While the road to cloud is not so matured & still evolving.

An IT pro like myself might be reluctant to move to cloud since we are going out of our comfort zone, we have invested years of our time to learn & become expert on a product or technology & now we need to move on to something else. Further there are some gaps between on-premise technology where features are available out of the box & in cloud where we are expected to code it.

The bad news is whether we like it or not it is coming our way & at a rapid pace. For instance, the features which were not available 6 months back are now available & as I witness, the offerings on cloud might mature at rapid pace then the time taken by their respective on-premise offerings since the Product Teams exactly knows which features are liked by the consumers the most.

As I see, we are in the times who are witnessing the cloud revolution & we should take every opportunity to learn cloud before it’s too late rather than ignoring it, assuming it is still not matured yet. In 2000′s people who can spell .Net were hired & it was a skillset which lasted long & was worth the investment of time which gave & is still giving people handsome returns. Cloud skillset is definitely one such investment which is going to give returns to IT Pros for a very long time.

To conclude, I will leave you with the following thought

You can either witness this change or “Be the change”.

Hope this helps !!!

Parikshit Savjani
Premier Field Engineer

The mysterious case of data load in a partitioned table

I have chosen this title for my blog post since troubleshooting this issue was like solving a mystery. One of my customer had recently faced a slow performance while loading the data in a partitioned table. Whenever there is performance issue for an insert statement, the usual suspects which come to our mind is blocking or Disk IO so we started investigating the waits for the query in sys.dm_exec_requests & sys.dm_os_waiting_tasks but to our surprise the insert query was either in running or runnable state which didn’t justify the slow performance of the query.

Further, tracking the Disk IO activity we found that actual insert operation is completed within few mins however before the writes are performed, there is increased read activity on the sql instance & some kind of wait which cannot be identified from the waiting_tasks & dm_exec_requests DMV since they appear to be in running or runnable state.

The table is a partitioned table which is around 1TB in size & during the data load, there is an internal memory pressure on the SQL Server with PLE & Free Pages on the SQL Server going down while Lazy writes/sec increasing which too some extend is expected during the data load but we didn’t have any performance baseline to compare against.

Normally, I wouldn’t check the query plan for slower insert performance since unlike a select query which can be optimized by indexing or by stats refresh, there is limited scope for tuning insert/update or deletes. However I captured the query plan to understand the execution path chosen by the query which helped me solved the mystery.

BadPlan

In the Execution Plan,  we see a Sorted Merge Join Operator  & Index Scan operation which is used to perform Unique Key validation before performing the actual insert of data. Performing an index scan on 1TB table is definitely not a good plan.

First question,  why did optimizer chose a Merge Join & Index Scan for Unique Key validation, the answer to that is yellow exclamation warning shown in the Index Scan operator which is missing statistics indication.

When we see more details on the Index Scan operator, we find missing statistics on Ptnid which is hidden column in partitioned index added starting SQL 2008. More details on Ptn id can be found in Craig’s blog below http://blogs.msdn.com/b/craigfr/archive/2008/08/05/partitioned-indexes-in-sql-server-2008.aspx

IndexScan

The only supported way to rebuild the missing statistics for Ptnid is to rebuild the partitioned index on the partitioned table. Although rebuilding a large partitioned table is a not an easy task & can run upto multiple days, it was worth the time & resources spend.

After rebuilding the indexes on the partitioned table, the performance of the data load improved as per the expectation & we saw the following Execution Plan which improved the performanceGoodPlan

The Good Execution Plan uses Nested Loop Join which is non-blocking operator & performs Index Seek operation on the Partitioned table which reduces the memory footprint,  Disk IO & improves the performance of insert statement drastically.

Hope you enjoyed the mysterious case of data loading in a partitioned table

Hope this helps !!!

Parikshit Savjani
Premier Field Engineer

Sociometer Using Power BI

Recently I presented a demo on Power BI for the internal audience at Microsoft which was very received & appreciated by the audience. I thought it would be a good demonstration for external audience as well.

Hope you all like the demo & Power BI !!!

Parikshit Savjani
Premier Field Engineer

Impact of Pre-grown Data & Log Files on the Restore Time of the Database

Today, I was asked the following question by one of my colleague who specializes in Sharepoint

If the actual data is 50GB in size & if we create a pre-grown data file of 200GB to avoid autogrow & fragmentation, the database restore time for the pre-grown database should be much larger than the restore time of the database without pregrowth with  datafile size of 50GB.

My initial guess was, the restore time shouldn’t vary too much since the data file creation should be fast due to instant file initialization, while the database restore operation only restores the active pages with data which should be same in both the scenarios & hence restore time shouldn’t vary too much. However this was purely based on my theoritical knowledge and it was always important to validate & test it.

So I used a sample database with initial primary datafile size of 4GB  & Log File size of 1GB with around 3GB of active data within the database.

To see the impact of the pre-grown datafiles, I kept the initial log file size constant to 1GB, while increased the initial datafile size in increments of 5 GB  keeping the active data in the database  constant at 3GB.

restorea

restore1

For my testing, I used SQL 2012 with SP1 & further I used backup compression while taking the backup each time I have pregrown the data file. As noted in the table above, irrespective of the data file size, the backup size was consistent at 266MB since the data was consistent & static in the database which was as per the expectation.

However, the restore time increased proportionally as the initial data files size increases which didn’t make much sense.

So I turned on the Trace Flag 3004 & 3605 to trace the restore events and identify why the restore time varies during each restores.

After turning on the Trace Flags, from the errorlog I discovered the following events which justified the time difference between the restores

Errorlog for Restore with 5GB data file size

12:57:46 PM spid56 Zeroing completed on C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\SampleDB_log.ldf 0:00:05
12:57:54 PM spid56 Zeroing completed on C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\SampleDB.mdf 0:00:08
12:58:04 PM spid56 Restore: Waiting for log zero on SampleDB 0:00:10

Errorlog for Restore with 20GB data file size

12:51:35 PM spid56 Zeroing completed on C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\SampleDB_log.ldf 0:00:06
12:52:38 PM spid56 Zeroing completed on C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA\SampleDB.mdf 0:01:02
12:52:49 PM spid56 Restore: Waiting for log zero on SampleDB 0:00:11

As seen in the data above, the additional time spend during each restores were due to Zeroing of data files which means the instant file initialization was not working for me. I realized my SQL Server was running using the managed service accounts (NT Service\MSSQLServer) which does not possess Perform Volume Maintenance Task permissions by default. I assigned  the permissions to the NT Service\MSSQLServer and restarted the SQL Server & performed the above tests again. The following results were found

restore2

The above results does make sense & meets our expectation since with Instant File initialization, zeroing of the datafile is avoided which makes the restore time more or less same.

However instant file initialization doesn’t apply to Logfiles and we see the following results when we test the restore, keeping the initial size of data file constant while increasing the initial size of the Log file.

restore3

restore4

So the restore time does increase with the increase in the Log File size. Again when I turned on the tracing  to identify where the majority of the time is spent, I found that major time is spent in zeroing of log files.

However in case of the Log Files, the performance advantages of pre-grown Logfiles is significant due to number of VLFs created for every Autogrow  can slower the Database recovery time proportionally.

Hence with instant file initialization for data files & some compromise for restore time, it would still be recommended to pre-grow the data file & log files for the database to avoid Auto grow & fragmentation on the disk.

Autogrow should be used as insurance which you would want to have but not use it until you are in the dire need of it.

Hope this helps !!!

Parikshit Savjani
Premier Field Engineer

Troubleshooting SQL Server Hang using system_health Xevents

Recently my customer reported they were seeing intermittent server hangs, connection timeouts and slow performance on their SQL Server instance. The first question I had, do we have PSSDIAG running on the SQL Instance for me to be able to understand the cause of the hang. However since we won’t be running PSSDIAG proactively on any of the servers we didn’t had a logs to look at.  So we started looking at the system_health Xevents session data at the time of the slow performance.

If you are not familiar with default system health Xevent sessions, I would suggest you read the blog post from Bob Dorr here

Reading from System_Health Xevents session is not the most convenient way of troubleshooting and hence I was used a solution which is developed my fellow PFE colleague Denzil which has blogged here

We observed the issue occurs intermittently when the application hangs and doesn’t allow any further connections to the SQL Server.

Resource Utilization

We do not see CPU or Memory pressure on the SQL Server instance. The System CPU utilization is 40% on any average with some spikes causing the CPU Utilization to rise to 100 % however during these times the SQL CPU Utilization is consistent to 30-35% on an average.

The Available Memory on the server is unchanged and is consistent while there is no low memory alert observed on the SQL instance

CPUUtil

MemoryUtil

 

 

 

 

 

 

 

 

Query Processing & THREADPOOL Waits

We see very high waits for THREADPOOL wait type which suggests we are running out of worker threads due to which we observed slowness and hangs on the SQL Instance. From the following Query Processing logs, we see that there is sudden surge in the worker threads from 198 to 806 worker threads in 10 min interval from 7:10 AM to 7:20 AM…. (Server time would be 2:10 AM to 2:20AM).

QueryProcessingEvent

 

 

 

 

When we check sys.dm_exec_requests data captured during the same time we see there are concurrent requests to SQL instance increased from 80 to 500 during the same interval which justifies the surge in the worker threads created to serve the 500 concurrent requests.

However we need to understand here, THREADPOOL waitype is not the cause but a victim since the assigned worker threads are not completing their assigned tasks ASAP. The root cause of the issue in this case was the blocking observed on the SQL instance which prevents the worker threads to complete its task quickly and release the worker thread back to worker pool.

Blocking

In this case, we found that the Head blocker is waiting on the non-io Buffer Latch PAGELATCH_XX (PAGELATCH_EX, PAGELATCH_UP, PAGELATCH_SH)  caused due to sequential inserts in a large table (Last Page Contention).  In this case we suggested to partition the table and build the indexes on alternate column to avoid Last page content (Further info on resolving latch contention can be found in this white paper).

 

Blocking

 

 

 

 

 

 

 

 

 

Using the System Health Xevents session and solution developed by Denzil we were able to nail down the issue on the SQL Instance without capturing any additional data.

Hope this helps !!!

Parikshit Savjani
Premier Field Engineer

Can number of worker threads in SQL Server exceed max worker threads setting?

Recently while troubleshooting a performance issue on the SQL Instance which was experiencing a very high THREADPOOL wait type I observed something which I was unaware until I hit this issue.

While I was looking at the system health xel files captured at the time of the issue, specifically at the query_processing event trace, the following highlighted line caught my attention.

The server has a 16 core x64 bit processors and as per the max worker thread calculation, the worker threads on the server should not exceed 512 + (16-4)*16=704  values. In the query_processing event, we observed the maxworkers correctly reported as 704, however I saw that workerCreated exceeded the maxWorkers which was unusual.

workerthread

 

 

 

 

 

 

 

 

I checked Denzil R & Suresh Kandoth and they helped me understand that this is an expected behavior in SQL Server code since the max worker threads setting only controls the user tasks or in other user batch and doesn’t control the system tasks or internal tasks. The same is also documented in the msdn article as shown below

http://msdn.microsoft.com/en-us/library/ms190219.aspx

The max worker threads server configuration option does not take into account threads that are required for all the system tasks such as Availibility Groups, Service Broker, Lock Manager, and others. If the number of threads configured are being exceeded, the following query will provide information about the system tasks that have spawned the additional threads

SELECT
s.session_id,
r.command,
r.status,
r.wait_type,
r.scheduler_id,
w.worker_address,
w.is_preemptive,
w.state,
t.task_state,
t.session_id,
t.exec_context_id,
t.request_id
FROM sys.dm_exec_sessions AS s
INNERJOIN sys.dm_exec_requests AS r
    ON s.session_id = r.session_id
INNER JOIN sys.dm_os_tasks AS t
    ON r.task_address = t.task_address
INNER JOIN sys.dm_os_workers AS w
    ON t.worker_address = w.worker_address
WHERE s.is_user_process = 0;

Don’t be surprised if you see the worker threads in sys.dm_os_workers to exceed the max worker threads and you can use the above query to identify the system task which are using the additional worker threads

Hope this helps !!!

Parikshit Savjani
Premier Field Engineer

Connection to SQL 2012 Integration Services using SSMS for non-admin gives “Access is denied”

Consider a scenario where you installed SQL 2012 Integration Services and when non-admin is trying to connect to the Integration Service using SSMS , they receive an “Access is denied” error as shown in the screen below

SSISError1

 

 

 

 

 

 

 

This issue might be caused due to behavioral changes with security in  SSIS 2012 setup. In previous versions of SQL Server, by default when you installed SQL Server all users in the Users group had access to the Integration Services service. When you install the current release of SQL Server, users do not have access to the Integration Services service. The service is secure by default. After SQL Server is installed, the administrator must grant access to the service

To grant access to non-admin user to the Integration Services service

  1. Run Dcomcnfg.exe. Dcomcnfg.exe provides a user interface for modifying certain settings in the registry.
  2. In the Component Services dialog, expand the Component Services > Computers > My Computer > DCOM Config node.
  3. Right-click Microsoft SQL Server Integration Services 11.0, and then click Properties.
  4. On the Security tab, click Edit in the Launch and Activation Permissions area.
  5. Add users and assign appropriate permissions, and then click Ok.
  6. Repeat steps 4 – 5 for Access Permissions.
  7. Restart SQL Server Management Studio.
  8. Restart the Integration Services Service

If we still continue to see the Access is denied errors, we should following troubleshooting steps mentioned in the following KB article

http://support.microsoft.com/kb/940232

Hope this helps !!!

Parikshit Savjani
Premier Field Engineer

 

 

 

Oracle on Azure Presentation at the Windows Azure Conference 2014

Recently I presented at the Windows Azure Conference 2014 on Deploying & Configuring Oracle on Windows Azure as Infrastructure as a Service (IaaS).

As Organizations are looking to cut down their I.T infrastructure costs, they are looking at 2 options viz Virtualization or private cloud And Public Cloud. Microsoft’s vision, strategy & commitment to it’s customers is to provide private & public cloud platforms for the customers that will enable them to have hybrid deployment strategies to deploy & configure their software.

While Oracle’s vision & strategy is to give it’s customers greater choice and flexibility to deploy Oracle software on multiple platforms thereby increasing the usage of Oracle software.

Both Microsoft & Oracle shared a common vision of helping their customers move towards cloud thereby helping the customers reduce their IT costs without any compromise on the supportability & features. With this common vision, Microsoft & Oracle entered into a partnership back in June 2013 and following is the official statement from Microsoft & Oracle on the partnership.

“This partnership will help customers embrace cloud computing by providing greater choice and flexibility in how to deploy Oracle software”

Following is my presentation which I presented at the Windows Azure Conference 2014 held in Bangalore, India on 20,21st March, 2014.

Hope this Helps !!!
Parikshit Savjani
Premier Field Engineer

SSRS Report in Sharepoint Integrated Mode times out execution after 110 sec with unexpected error

Recently we faced a weird error while deploying an SSRS Report in a document library in SharePoint. The SSRS Report was  meant to be a Drill Through Report which was to be used as Detailed Report in PPS Dashboard exposed via SSAS Reporting Action.

Being a Detailed Report it was expected to fetch a lot of data and hence we paginated the tablix so that rendering in HTML is done fast. However for some parameters, the query itself took more than 2 mins to execute while fetching the data from cube ( We used SSAS Cube as data source for our cube).

However we observed the same report with same set of parameter was working fine while previewing it in SSDT ( BIDS for earlier versions) however after deploying the same report to SharePoint site, When we tried to browse the report in SharePoint it tried to load for some time but then we see an unexpected error occurred in the Web Page.

Our observation was, whenever the report was fetching a smaller set of data, the report render fast as expected but whenever the dataset was larger or in other words whenever the query execution took more than 100 sec, the report fails with unexpected error in Sharepoint. Further we timed the occurrence of the error and found the error occurs exactly after 110 sec of the report execution. This behavior gave us a clue that there was some timeout setting which was causing the report to timeout after 110 sec. Further this timeout setting is specific to SSRS in Sharepoint Integrated Mode.

After some research, I discovered that in Sharepoint Web.Config the httpruntime setting for executionTimeout is not specified which defaults to 110 sec.

The httpruntime executionTimeout setting specifies the maximum number of seconds that a request is allowed to execute before being automatically shut down by ASP.NET.

Reference: http://msdn.microsoft.com/en-us/library/e1f13641(v=vs.85).aspx

To resolve the issue, we need to modify the web.config for the Sharepoint site with SSRS Integrated (Web.Config for the Sharepoint site can be located from IIS Manager by exploring the site). In the Web.Config we need to search for httpruntime and add executionTimeout element as shown below

<httpRuntime maxRequestLength=”51200″ executionTimeout=”1800″ />

Following the change in Web.Config, Save it and restart IIS. After IIS reset, the changes take effect and we did not see the SSRS Report to timeout.

Hope this helps !!!

Parikshit Savjani
Premier Field Engineer