Cleansing of UK Address Postcodes in SQL Server 2008
Create an SQL Server Function to cleanse, replace common mistakes and re-format UK Postcodes with the correct space position
Context
This article has been quite popular, so I have updated it with some more information and cleaned up the formatting, as a couple of spaces were missed out, so it can be copied straight into SSMS now.
It uses the following steps;
It uses the following steps;
- Remove everything from the text string (including spaces) apart from numbers and letters
- Replaces the standard typing mistakes like;
- Portsmouth is PO, Colchester is CO, but sometimes gets entered as P0 or C0 (a zero)
- Ilford and Reading are IG and RG, but sometime get entered with a 6 in place of the G
- Work out format the postcode should be in, and insert a space where it should be.
- Check that the format is correct and return a new value if true, else return the old format.
Please note, this does not check whether the post code actually exists, and further reading can be found on the following link Postcodes in the United Kingdom.
SQL
CREATE FUNCTION CleansePostCode (@PostCode VARCHAR(100)) RETURNS VARCHAR(100) AS BEGINDECLARE @OldPostCode VARCHAR(100)SET @OldPostCode=LTRIM(RTRIM(@PostCode))
--Clean to Numbers and LettersDECLARE @Letter INTSET @Letter = PATINDEX('%[^0-Z]%',@PostCode)BEGINWHILE @Letter>0BEGINSET @PostCode = STUFF(@PostCode,@Letter,1,'')SET @Letter = PATINDEX('%[^0-Z]%',@PostCode)ENDEND
--Replace obvious errorsSET @PostCode =(CASE WHEN LEFT(@PostCode,2)='P0' THEN STUFF(@PostCode,1,2,'PO')WHEN LEFT(@PostCode,2)='C0' THEN STUFF(@PostCode,1,2,'CO')WHEN LEFT(@PostCode,2)='I6' THEN STUFF(@PostCode,1,2,'IG')WHEN LEFT(@PostCode,2)='HO' THEN STUFF(@PostCode,1,2,'HD')WHEN LEFT(@PostCode,2)='C8' THEN STUFF(@PostCode,1,2,'CB')WHEN LEFT(@PostCode,2)='D0' THEN STUFF(@PostCode,1,2,'DO')WHEN LEFT(@PostCode,2)='H5' THEN STUFF(@PostCode,1,2,'HS')WHEN LEFT(@PostCode,2)='0L' THEN STUFF(@PostCode,1,2,'OL')WHEN LEFT(@PostCode,2)='0X ' THEN STUFF(@PostCode,1,2,'OX')WHEN LEFT(@PostCode,2)='P0' THEN STUFF(@PostCode,1,2,'PO')WHEN LEFT(@PostCode,2)='R6' THEN STUFF(@PostCode,1,2,'RG')ELSE @PostCode END)
--Works out correct pattern and insert spaceSET @PostCode = (CASE WHEN @PostCode LIKE '[A-Z][A-Z][0-9][0-9][A-Z][A-Z]' THEN LEFT(@PostCode,3)+' '+RIGHT(@PostCode,3)WHEN @PostCode LIKE '[A-Z][0-9][0-9][A-Z][A-Z]' THEN LEFT(@PostCode,2)+' '+RIGHT(@PostCode,3)WHEN @PostCode LIKE '[A-Z][0-9][0-9][0-9][A-Z][A-Z]' THEN LEFT(@PostCode,3)+' '+RIGHT(@PostCode,3)WHEN @PostCode LIKE '[A-Z][0-9][A-Z][0-9][A-Z][A-Z]' THEN LEFT(@PostCode,3)+' '+RIGHT(@PostCode,3)WHEN @PostCode LIKE '[A-Z][A-Z][0-9][0-9][0-9][A-Z][A-Z]' THEN LEFT(@PostCode,4)+' '+RIGHT(@PostCode,3)WHEN @PostCode LIKE '[A-Z][A-Z][0-9][A-Z][0-9][A-Z][A-Z]' THEN LEFT(@PostCode,4)+' '+RIGHT(@PostCode,3)ELSE @PostCode END)
--Update @NewPostCode to old value if pattern does not match, or else use new value.DECLARE @NewPostCode VARCHAR(100)IF (CASE WHEN @PostCode LIKE '[A-Z][A-Z][0-9] [0-9][A-Z][A-Z]' THEN 1WHEN @PostCode LIKE '[A-Z][0-9] [0-9][A-Z][A-Z]' THEN 2WHEN @PostCode LIKE '[A-Z][0-9][0-9] [0-9][A-Z][A-Z]' THEN 3WHEN @PostCode LIKE '[A-Z][0-9][A-Z] [0-9][A-Z][A-Z]'THEN 4WHEN @PostCode LIKE '[A-Z][A-Z][0-9][0-9] [0-9][A-Z][A-Z]' THEN 5WHEN @PostCode LIKE '[A-Z][A-Z][0-9][A-Z] [0-9][A-Z][A-Z]' THEN 6ELSE 0 END)=0BEGIN SET @NewPostCode=@OldPostCodeENDELSESET @NewPostCode=@PostCodeRETURN @NewPostCodeEND