Article got published for the Canadian MVP Blog Series, my musings on how Business Intelligence is maturing right before our eyes:
Category Archives: Articles
Originally posted on Idera – http://sqlserverperformance.idera.com/tsql-optimization/understanding-sql-server-statistics/
“Statistics provides tools that you need in order to react intelligently to information you hear or read” – David Lane, 2003
If there’s an upcoming election and you are running for office and getting ready to go from town to town city to city with your flyers, you will want to know approximately how many flyers you’re going to bring.
If you’re the coach of a sports team, you will want to know your players’ stats before you decide who to play when, and against who. You will often play a matchup game, even if you have 20 players, you might be allowed to play just 5 at a time, and you will want to know which of your players will best match up to the other team’s roster. And you don’t want to interview them one by one at game time (table scan), you want to know, based on their statistics, who your best bets are.
Just like the election candidate or the sports coach, SQL Server tries to use statistics to “react intelligently” in its query optimization. Knowing number of records, density of pages, histogram, or available indexes help the SQL Server optimizer “guess” more accurately how it can best retrieve data. A common misnomer is that if you have indexes, SQL Server will use those indexes to retrieve records in your query. Not necessarily. If you create, let’s say, an index to a column City and <90% of the values are ‘Vancouver’, SQL Server will most likely opt for a table scan instead of using the index if it knows these stats.
Recap of T-SQL vs SQLCLR (pseudo) Debate
There have been numerous posts about benefits of using T-SQL vs SQLCLR, and vice versa. And we all know the mantra – T-SQL for data access, SQLCLR for anything that is computationally intense tasks.
SQL Server is a relational database, and works best with set based operations and direct data access. If you need to do straightforward INSERTs, UPDATEs, DELETEs, SELECTs, stick with T-SQL unless you want to do lots of overtime trying to figure out why you’re data access suddenly became slow.
So when do we use SQLCLR instead of T-SQL?
There are already lots of discussions – even books – about SQLCLR advantages. I will defer you to them (check out the references section), but I will provide a very brief list of scenarios when you might want to consider SQLCLR:
- Interaction outside SQL Server If you need to work with the OS, files, registry etc.
- Validation If you need to validate phone numbers, email addresses, postal codes, or any patterns
- Complex computations If you need running aggregates, complex math equations (what is the square root of x to the nth power divided by 2 * pi?), financial analytics maybe?
- Custom Data Types If you need to create your own custom business-specific data types. We also have to remember that XML and GEOGRAPHY/GEOMETRY are great additions to SQL Server, and these are technically CLR data types.
The debate between T-SQL vs SQLCLR sometimes can be taken out of context. I call it a pseudodebate, because sometimes it’s made to seem that SQLCLR is meant to replace T-SQL, when it’s not. SQLCLR is *not* evil. It’s just another tool to help you do your job. It *can* become evil though, if you misuse it.
While there are some overlaps in scenarios where you can use both, these two should really be complementary. Where T-SQL is slow or lacking, SQLCLR should at least be considered and tested. Again, consider the right tool for the right job.