Understanding SQL Server Statistics
Originally posted on Idera – http://sqlserverperformance.idera.com/tsql-optimization/understanding-sql-server-statistics/
“Statistics provides tools that you need in order to react intelligently to information you hear or read” – David Lane, 2003
If there’s an upcoming election and you are running for office and getting ready to go from town to town city to city with your flyers, you will want to know approximately how many flyers you’re going to bring.
If you’re the coach of a sports team, you will want to know your players’ stats before you decide who to play when, and against who. You will often play a matchup game, even if you have 20 players, you might be allowed to play just 5 at a time, and you will want to know which of your players will best match up to the other team’s roster. And you don’t want to interview them one by one at game time (table scan), you want to know, based on their statistics, who your best bets are.
Just like the election candidate or the sports coach, SQL Server tries to use statistics to “react intelligently” in its query optimization. Knowing number of records, density of pages, histogram, or available indexes help the SQL Server optimizer “guess” more accurately how it can best retrieve data. A common misnomer is that if you have indexes, SQL Server will use those indexes to retrieve records in your query. Not necessarily. If you create, let’s say, an index to a column City and <90% of the values are ‘Vancouver’, SQL Server will most likely opt for a table scan instead of using the index if it knows these stats.
For the most part, there *may* be minimal we need to do to keep our statistics up-to-date (depending on your configurations), but understanding statistics a little bit better is in order to help us understand SQL Server optimization a little bit more.
How are statistics created?
Statistics can be created different ways
– Statistics are automatically created for each index key you create.
– If the database setting autocreate stats is on, then SQL Server will automatically create statistics for non-indexed columns that are used in queries.
What do statistics look like?
If you’re curious, there’s a couple ways you can peek at what statistics look like.
Option 1 – you can go to your Statistics node in your SSMS, right click > Properties, then go to Details. Below is a sample of the stats and histogram that’s collected for one of the tables in my database
The histograms are a great way to visualize the data distribution in your table.
How are statistics updated?
Notice that there are two (2) options with the Auto Update statistics.
– Auto Update Statistics basically means, if there is an incoming query but statistics are stale, SQL Server will update statistics first before it generates an execution plan.
– Auto Update Statistics Asynchronously on the other hand means, if there is an incoming query but statistics are stale, SQL Server uses the stale statistics to generate the execution plan, then updates the statistics afterwards.
How do we know statistics are being used?
One good check you can do is when you generate execution plans for your queries:
check out your “Actual Number of Rows” and “Estimated Number of Rows”.
If these numbers are (consistently) fairly close, then most likely your statistics are up-to-date and used by the optimizer for the query. If not, time for you to re-check your statistics create/update frequency.
What configuration settings should we set?
There may be cases when you may want to disable statistics update temporarily while you’re doing massive updates on a table, and you don’t want it to be slowed down by the autoupdate.
However, for the most part, you will want to keep the SQL Server settings:
– auto create statistics
– auto update statistics
Elisabeth Redei has an excellent 3-part series on SQL Server Statistics:
Excellent Books that touch on statistics
– Apress. Grant Fritchey & Sajal Dam. SQL Server 2008 Query Performance Tuning Distilled.
– RedGate. Holger Schmeling. SQL Server Statistics.