This is an expansion (and slight variation) of Joe’s answer in StackOverflow regarding the thread Find the maximum consecutive years for each ID’s in a table (Oracle SQL) using SQL Server. I’ve tested this in SQL Server 2014, but should work from SQL Server 2005 onwards where the ranking functions are supported.
The problem Joe solved is not uncommon – i.e. how to get the consecutive years of something (consecutive year sales, consecutive years in school, consecutive years volunteering etc), and his solution is pretty clever.
Let’s assume you are working on a sales database. You have multiple clients who could be purchasing from your store several times a year, or a couple of times every five years. You may want to know what’s the maximum consecutive years they’ve purchased from you. Why? Perhaps in a marketing campaign, you may want to give your loyal customers (purchased in 5 or more consecutive years) a special discount.
Your data may look like the following screenshot. Notice that in this example, Client 00001, 00002 and 00003 purchased only once. Client 00004 purchased several times, and it looks like there were purchases in consecutive years.
Download sample T-SQL script to create and populate the SAMPLESALES table
Let’s take it step by step to understand both the problem and solution better. It will be easy to visually identify which clients have purchased consecutive years if we first display all the unique years that client has purchased:
SELECT DISTINCT CLIENTID, CLIENTNAME, YEAR(REVENUEDATE) REVENUEYEAR FROM SAMPLESALES
Here’s the result:
Here we can see Client 00004 purchased in 2004 and 2005 (2 consecutive years), but did not purchase in 2006. Starting 2007, Client 0004 started purchasing again every year until 2012 (6 consecutive years).
First thing we can do to try and understand how we can solve this issue is to do a self join to see what’s the difference between the years in the records. What will also help us is getting how many unique years are in the table between the two years we are looking at, which we accomplish below using a subquery. Note that the CTE (Common Table Expression) that I am using already provides just distinct years per client.
SELECT c1.CLIENTID, c1.CLIENTNAME, c1.REVENUEYEAR YEAR1, c2.REVENUEYEAR YEAR2, -- year spread between the records c2.REVENUEYEAR - c1.REVENUEYEAR YEARDIFF, -- how many records are in our table that is -- within this date range ( SELECT COUNT(*) FROM revenueyears_cte c3 WHERE (c3.CLIENTID = c1.CLIENTID) AND (c3.REVENUEYEAR BETWEEN c1.REVENUEYEAR AND c2.REVENUEYEAR) ) NUMRECORDSINBETWEEN FROM revenueyears_cte c1 LEFT JOIN revenueyears_cte c2 ON c1.CLIENTID = c2.CLIENTID AND c1.REVENUEYEAR < c2.REVENUEYEAR
Using Client 0004, I’ve highlighted a couple of records.
The first one in Row 16 where year1 = 2005 and year 2 = 2012 gives us a YEARDIFF of 7 (because 2012 minus 2005 is 7), and the number of records in between is 7 as well. In reality, if we had records for this client for each year from 2005 to 2012, we should have 8 records not 7.
This means we are looking for the record that has the highest NUMRECORDSINBETWEEN but also where NUMRECORDSINBETWEEN is higher than YEARDIFF. So, although Row 16 gives us a higher NUMRECORDSINBETWEEN, it is in fact Row 21 that gives us the maximum number of consecutive years.
Once we know that, we can simply pull the records where NUMRECORDSINBETWEEN is less than the spread. In my query I also want to retain clients that have purchased only for a single year, which are records where NUMRECORDSINBETWEEN is 0. In the query I want to re-assign these clients a 1 for CONSECUTIVEYEARS:
SELECT CLIENTID, CLIENTNAME, CASE WHEN NUMRECORDSINBETWEEN = 0 THEN 1 ELSE NUMRECORDSINBETWEEN END AS CONSECUTIVEYEARS FROM revenueyears_cte2 WHERE (NUMRECORDSINBETWEEN > YEARDIFF) OR (NUMRECORDSINBETWEEN = 0)
All the prep work is done. Now we can simply use the MAX() aggregate function to get the record with the most consecutive years.
If you want, we can even throw in a subquery with FOR XML PATH to pull all the years, so we can visually check. (FOR XML is great at concatenating all these strings, but it’s not without it’s cons. Remember this is a subquery that will go through all your records, so please test before you use in your environments especially if you have large data sets).
We can check that:
- Client 00004 purchased from 2004-2005 which is 2 consecutive years, missed 2006, and then again from 2007-2012. Consecutive years is correctly counted from 2007-2012 which is 6.
- Client 00007 purchased 2009 and 2010, and consecutive years is correctly counted at 2
- Client 00008 purchased 2002 and 2004, which are not consecutivey, and consecutive years is correctly counted at 1
Here is the full script, which utilizes CTEs (Common Table Expressions) to simplify the query and make it more readable:
; WITH -- clients with distinct years revenueyears_cte AS ( SELECT DISTINCT CLIENTID, CLIENTNAME, YEAR(REVENUEDATE) REVENUEYEAR FROM SAMPLESALES2 ), -- get the year spread and number of records in between revenueyears_cte2 AS ( SELECT c1.CLIENTID, c1.CLIENTNAME, c1.REVENUEYEAR YEAR1, c2.REVENUEYEAR YEAR2, c2.REVENUEYEAR - c1.REVENUEYEAR YEARDIFF, ( SELECT COUNT(*) FROM revenueyears_cte c3 WHERE (c3.CLIENTID = c1.CLIENTID) AND (c3.REVENUEYEAR BETWEEN c1.REVENUEYEAR AND c2.REVENUEYEAR) ) NUMRECORDSINBETWEEN FROM revenueyears_cte c1 LEFT JOIN revenueyears_cte c2 ON c1.CLIENTID = c2.CLIENTID AND c1.REVENUEYEAR < c2.REVENUEYEAR ) , revenueyears_cte3 AS ( SELECT CLIENTID, CLIENTNAME, CASE WHEN NUMRECORDSINBETWEEN = 0 THEN 1 ELSE NUMRECORDSINBETWEEN END AS CONSECUTIVEYEARS FROM revenueyears_cte2 WHERE (NUMRECORDSINBETWEEN > YEARDIFF) OR (NUMRECORDSINBETWEEN = 0) ) SELECT CLIENTID, CLIENTNAME, MAX(CONSECUTIVEYEARS) MAX_CONSECUTIVEYEARS , STUFF(( SELECT DISTINCT ', ' + CAST(REVENUEYEAR AS VARCHAR(4)) FROM revenueyears_cte t WHERE t.CLIENTID = cte3.CLIENTID ORDER BY ', ' + CAST(REVENUEYEAR AS VARCHAR(4)) FOR XML PATH('') ), 1, 2, '') YEARSGIVING FROM revenueyears_cte3 cte3 GROUP BY CLIENTID, CLIENTNAME ORDER BY CLIENTNAME
You can vary this solution a bit more, by utilizing ranking functions similar to Joe’s original solution.
That’s it. Hope this helped explain Joe’s elegant solution in StackOverflow regarding the thread Find the maximum consecutive years for each ID’s in a table (Oracle SQL) .
I believe the query can be written with more performance when executed
WITH cteSource
AS (
SELECT ClientID,
ClientName,
MAX(COUNT(*)) OVER (PARTITION BY ClientName) AS MaxConsecutiveYears
FROM (
SELECT ClientID,
ClientName,
DATEPART(YEAR, RevenueDate) – ROW_NUMBER() OVER (PARTITION BY ClientID ORDER BY DATEPART(YEAR, RevenueDate)) AS theGrp
FROM dbo.SampleSales
GROUP BY ClientID,
ClientName,
DATEPART(YEAR, RevenueDate)
) AS d
GROUP BY ClientID,
ClientName,
theGrp
)
SELECT s.ClientID,
s.ClientName,
s.MaxConsecutiveYears,
MAX(STUFF(f.Data, 1, 2, ”)) AS YearsGiving
FROM cteSource AS s
CROSS APPLY (
SELECT ‘, ‘ + DATENAME(YEAR, x.RevenueDate)
FROM dbo.SampleSales AS x
WHERE x.ClientID = s.ClientID
GROUP BY DATENAME(YEAR, x.RevenueDate)
ORDER BY DATENAME(YEAR, x.RevenueDate)
FOR XML PATH(”)
) AS f(Data)
GROUP BY s.ClientID,
s.ClientName,
s.MaxConsecutiveYears
ORDER BY s.ClientName;
LikeLike
Thanks Peter! It’s great to see other variations and improvements, I will definitely check your solution out.
LikeLike
Stripping it down to the minimal data, how would this work for you?
CREATE TABLE Test
(client_name CHAR(5) NOT NULL,
something_year INTEGER NOT NULL,
PRIMARY KEY(client_name, something_year);
WITH X
AS
(SELECT client_name, something_year,
LAG(something_year) OVER (PARTITION BY client_name
BETWEEN ROWS 5 PRECEDING AND CURRENT ROW)
AS prior_something_year
FROM TEST)
SELECT X.client_name, X.something_year,
FROM X
WHERE X.something_year – 4 = X.prior_something_year;
LikeLike
This is great. Very elegant. Thank you Joe!!
LikeLike