TABLESAMPLE Gotcha (ah the stress)
I needed a few random rows from one of our tables with at least a million rows.
I used the following query:
SELECT myID, ROW_NUMBER() OVER (ORDER BY myID) AS RowNumber FROM AD TABLESAMPLE (25 ROWS)
This worked awesome when I was testing it on my local machine. However, as soon as I deployed it to a dev machine, it just decided to stop working. I knew TABLESAMPLE was pretty random – it can return 15 or 30 or 100 even if I request for 25 rows. I just didnt realize it was very unreliable.
I found this post from Erland Sommarskog:
I saw the suggestion to use TABLESAMPLE, and I was not very happy with it, but I did not reply to at the time. TABLESAMPLE is fairly
approxamite in its nature. I ran
SELECT * FROM Orders TABLESAMPLE (1 ROWS)
SELECT * FROM Orders TABLESAMPLE (10 ROWS)
SELECT * FROM Orders TABLESAMPLE (100 ROWS)
in an inflated version of Northwind with 344000 orders. The first two
selects returned no rows at all, the last returned 86 rows. The first
time. The second time it returned no rows, and the last time it return 132 rows. Furthermore, the sample was not entirely random, but I got a couple of sequences of order ids. Presumably because TABLESAMPLE works on page level.
Possibly you could combine newid() and TABLESAMPLE:
SELECT TOP 1 * FROM (
select * from Orders TABLESAMPLE (100 ROWS)) AS d
ORDER BY newid()
You would need to make your sample size big enough so that you are
sure that it retrieves a row each time, but the bigger you make it,
the bigger the cost for the sorting.
If your table is only some few thousand of rows, it’s not likely to
be worth the pain.
Erland Sommarskog, SQL Server MVP, esquel [at] sommarskog.se
Books Online for SQL Server 2005 at
Books Online for SQL Server 2000 at
I will go with his suggestion – NOT to use TABLESAMPLE.