Remove invalid data based on particular pattern SQL Server

时间:2016-04-25 08:46:10

标签: sql sql-server

I have a sample data like shown below

------------------------------------------------
| ID |          Column 1           | Column 2 |
------------------------------------------------
| 1  | 0229-10010                  |Valid     |
------------------------------------------------
| 2  |                       20483 |InValid   |
------------------------------------------------
| 3  | 319574R06-STAT              |Valid     |
------------------------------------------------
| 4  | ,,,,,,,,,,,,,,1,,,,,,,      |InValid   |
------------------------------------------------
| 5  | "PBOM-SSE, CHAMBER"         |Valid     |
------------------------------------------------
| 6  | ""PBOM-SSE, CHAMBER         |InValid   |
------------------------------------------------
| 7  | "PBOM-SSE CHAMBER",         |InValid   |
------------------------------------------------
| 8  | #DRM-1102.Z                 |InValid   |
------------------------------------------------
| 9  | DRM#1102.Z                  |Valid     |
------------------------------------------------
| 10 |OEM-2-202 4079 KALREZ        |Valid     |
------------------------------------------------
| 11 |-OEM2202 4079 KALREZ#        |InValid   |
------------------------------------------------

What i want to do is i need to create a pattern in such a way that i need to fetch only invalid data. Just for representation i have mentioned Valid and Invalid. In my table i don't have any flag as such.

Here the trick is same, wildcard characters appearing at different places makes different sense. Consider record ID-5 and Id-6. In both the cases wildcard characters are same, but the position decides whether its valid or not. Again position is also not so clear. I guess you can make out why particular record in column 1 is valid and invalid. In record 8, '#' before that item doesn't makes sense, where as # after Alphabet makes sense (in record 9).

In record 2, there are lot of blank spaces before number, that's why its invalid, but that doesn't mean that space itself is wild card. I have written query like below.

SELECT [PartNumber]
FROM [IBSSSystems].[dbo].[Part]
WHERE (PartNumber LIKE '%[?;.,$^@&*{}:"<>/|\ %'']%'
       OR PartNumber LIKE '%[%'
       OR PartNumber LIKE '%]%')

The above query understands that whenever it see any wildcard character in a record , it fetches that. But I need the query in such a way that it understands and fetches only invalid data. I guess there will be lot of And and Or in the resulting query, but i'm confused. I hope you can help me out. Thanks in advance.

1 个答案:

答案 0 :(得分:3)

SELECT [PartNumber]
FROM [IBSSSystems].[dbo].[Part]
WHERE (PartNumber LIKE '[^A-Za-z0-9"]%' ESCAPE '\'          -- When the First character is special charater its InValid ( " is an exception)
        OR PartNumber LIKE '%[^A-Za-z0-9" ]' ESCAPE '\'     -- When the Last character is special charater its InValid ( " is an exception, also trailing spaces are exception)
        OR PartNumber LIKE '%[^A-Za-z0-9 ][^A-Za-z0-9 ]%'   -- When there are two or more consecutive special charaters its InValid        
        OR PartNumber LIKE '%[\^\[\]\\_?;$@&*{}:<>/|''~`]%'  ESCAPE '\' -- Add characters here which do not allowed to have any occurrence in the string
       )