Comparing Simple Efficiencies: T-SQL UDF vs SQLCLR UDF for Splitting Strings
Recap of T-SQL vs SQLCLR (pseudo) Debate
There have been numerous posts about benefits of using T-SQL vs SQLCLR, and vice versa. And we all know the mantra – T-SQL for data access, SQLCLR for anything that is computationally intense tasks.
SQL Server is a relational database, and works best with set based operations and direct data access. If you need to do straightforward INSERTs, UPDATEs, DELETEs, SELECTs, stick with T-SQL unless you want to do lots of overtime trying to figure out why you’re data access suddenly became slow.
So when do we use SQLCLR instead of T-SQL?
There are already lots of discussions – even books – about SQLCLR advantages. I will defer you to them (check out the references section), but I will provide a very brief list of scenarios when you might want to consider SQLCLR:
- Interaction outside SQL Server If you need to work with the OS, files, registry etc.
- Validation If you need to validate phone numbers, email addresses, postal codes, or any patterns
- Complex computations If you need running aggregates, complex math equations (what is the square root of x to the nth power divided by 2 * pi?), financial analytics maybe?
- Custom Data Types If you need to create your own custom business-specific data types. We also have to remember that XML and GEOGRAPHY/GEOMETRY are great additions to SQL Server, and these are technically CLR data types.
The debate between T-SQL vs SQLCLR sometimes can be taken out of context. I call it a pseudodebate, because sometimes it’s made to seem that SQLCLR is meant to replace T-SQL, when it’s not. SQLCLR is *not* evil. It’s just another tool to help you do your job. It *can* become evil though, if you misuse it.
While there are some overlaps in scenarios where you can use both, these two should really be complementary. Where T-SQL is slow or lacking, SQLCLR should at least be considered and tested. Again, consider the right tool for the right job.
In cases where some projects will clearly benefit from using SQLCLR, project teams still hesitate using it. Some of the common reasons I’ve heard that causes SQL Server DBAs and Developers to hesitate using SQLCLR are:
- It is unfamiliar ground. Let’s face it. Not every SQL Server professional will know how to program in .NET. It’s becoming increasingly popular, but we still have traditional database professionals
- It compromises security. Well, it’s yes and no. If you code it properly and if you deploy it properly, it should be as secure as your most secure database objects. However if you take the easy route, such as
Examining Simple Efficiencies
Let’s take a deep dive into a specific scenario, and let’s analyze simple performance differences between a T-SQL user defined function (UDF) vs a SQLCLR UDF. I emphasize simple differences for now, because sometimes, that’s all it takes to convince someone
This is one scenario where technically, either the T-SQL way or the SQLCLR way will work.
You have an incoming string that contains character delimited set of files. Let’s assume it’s a pipe delimited set of strings for employees, like this:
This is a legacy system, and you have no control over how the incoming data is formatted. The legacy system will also need to call a stored procedure, which in turn will parse the strings and output a table.
Of course you can argue there are other options to parse the file, such as pushing this functionality to an intermediate app etc, but for now let’s focus on two feasible options to parse the strings and output a table using SQL Server:
- Transact-SQL UDF
- SQLCLR UDF
Using T-SQL UDF – dbo.fn_Split
What we can do is create a Table Valued UDF that parses the incoming string based on a delimiter, and output a table.
Here’s one way we can do it: this one I’ve adapted from an RDACorp post. There’s also a few other variations, have a look at this SQLTeam Forum link.
Using SQLCLR UDF – dbo.fn_CLRSplit
For the SQLCLR UDF version, again there are a few variations. Normally we would use a String.Split method, or even Regex to split strings based on character delimiters (please see links below for additional samples)
In my case I will be borrowing Adam Machanic’s fast and scalable string splitting UDF. If you’re interested in his code, it’s here: Faster, More Scalable SQLCLR String Splitting.
In this article, Adam also discusses differences between different ways of parsing and splitting strings in the SQLCLR world – differences between the String.Split, Regex, and his way which traverses characters one by one. He does a great job explaining why and where one breaks down, and why his particular version is more scalable.
The tests for this experiment are really simple. Take a string that contains n items, then pass it to both the T-SQL and SQLCLR UDFs, measure CPU time and elapsed time, and compare. Ours will be a simple metric comparison – time – which is sometimes the most effective metric to use when battling to use SQLCLR with business users, sometimes even DBAs.
They say pictures paint a thousand words, so I’ll let these graphs do the initial talking.
There is clearly a performance difference between the T-SQL way and the SQLCLR way. It’s not so noticeable on a small scale, and perhaps some of you would bear to have this rather than turning on SQLCLR. However, on a bigger scale, the difference is exponential. My T-SQL UDF seems to have noticeably slowed down at the 1,000 item string, and died at the 100,000 item string (or maybe I was just impatient and cancelled the query altogether).
Right now we’re just considering simple efficiencies – CPU and execution times. We haven’t even considered I/Os, memory clerks etc. And clearly, SQLCLR offers the more efficient way to complete the task at hand.
Adam Machanic. Faster, More Scalable String Splitting.
MSDN. Using CLR Integration in SQL Server 2005
MSDN. Regular Expressions Make Pattern Matching And Data Extraction Easier
Simple-Talk. CLR Assembly RegEx Functions for SQL Server by Example
SQLSkills White Paper. SQL Server DBA Guide to the SQLCLR
StackOverflow Forum Question. Are CLR stored procedures preferred over TSQL stored procedures in SQL 2005+ ?
StackOverflow Forum Question. Microsoft SQL Equivalent of MySQL REGEXP
Ted Kreuger. Split String in SQL Server 2005+ CLR vs T-SQL