SQL WHERE clauses with date intervals - which way is best?
Context
Due to the Data Protection Act, certain data should be removed from systems at set times, in most instances it is seven years.
When investigation this at a client site, they were getting strange results when we checked the query sent to us.
SQL
SELECT *FROM AccountsWHERE DATEDIFF(YEAR,ClosedDate,GETDATE())>=7
Create a test table
Now this brought up what looked like the correct volume, however upon further inspection there were more accounts than if I did a static date seven years previous, so why did it do it?
I have built the following code to investigate it.
SQL
DECLARE @Records TABLE (RecordID INT,RecordClosed DATETIME)DECLARE @InsRecID INT = 1DECLARE @InsRecDate DATE=DATEADD(YEAR,-8,GETDATE())WHILE @InsRecDate INSERT INTO @Records SELECT @InsRecID,@InsRecDate SET @InsRecID = @InsRecID+1 SET @InsRecDate=DATEADD(DAY,1,@InsRecDate)ENDDECLARE @DateFrom DATE=DATEADD(YEAR,-7,GETDATE())SELECT MAX(RecordClosed) FROM @RecordsSELECT MAX(RecordClosed) FROM @Records WHERE DATEDIFF(YEAR,RecordClosed,getdate())>=7SELECT MAX(RecordClosed) FROM @Records WHERE RecordClosed<@DateFromSELECT MAX(RecordClosed) FROM @Records WHERE RecordClosed<=@DateFromSELECT MAX(RecordClosed) FROM @Records WHERE DATEDIFF(DAY,RecordClosed,getdate())/365.25>7
Testing
Results
2007-03-21 00:00:00.000
2006-12-31 00:00:00.000
2006-03-20 00:00:00.000
2006-03-21 00:00:00.000
2006-03-21 00:00:00.000
What is correct?
The first record is just the biggest date in the temp table, so that is fine.
The second record has picked up all accounts up to the end of 2006, this could have been a potential issue as too much data would have been removed. DATEDIFF on the year is purely checking the year is 7 years ago.
The third and fourth records are the one we want, dependent on whether you want to include the day seven years ago as in or out.The fifth record, although correct when run, is using leap year calculations, and for the sake of accuracy could not be trusted if you had to be specific.
Speed difference
While checking the records, I ran it on the system checking 2,500,000 records for 3 and 5 above.
Statement 5 returned our result in 6 seconds, Statement 3 in only 3 seconds, so not only were we now sure that we were accurate, but we were also processing the records quicker.
Try to use a proper date when querying dates, hope this helps someone scratching their head.