对于SQL中的每一行,按组分开单词

时间:2015-06-25 06:11:04

标签: sql sql-server sql-server-2008

我有一个类似

的字符串
No People,Day,side view,looking at camera,snow,mountain,tranquil scene,tranquility,Night,walking,water,Two Person,looking Down

我有一张桌子Group_words

Group                                                                                                                                                            Category
---------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------
No People,One Person,Two Person,Three Person,Four Person,five person,medium group of people,large group of people,unrecognizable person,real people              People
Day,dusk,night,dawn,sunset,sunrise                                                                                                                               Weather
looking at camera,looking way,looking sideways,looking down,looking up                                                                                           View Angle

我想用表Group_words检查每个逗号分隔的单词并找到错误的组合。

对于上面的字符串结果应该是:“没有人,天,侧视图,看着相机,雪,山,宁静的场景,宁静,走路,水

  • Night已删除,因为字符串中有Day
  • Two Person已删除,因为字符串中有No People
  • looking Down已删除,因为字符串中有looking at camera

我知道它很复杂,但我只想删除表格Group_words中可用的sting中不匹配的单词。

1 个答案:

答案 0 :(得分:2)

哇,你应该重新设计你的桌子。无论如何,这是我尝试使用Jeff Moden的DelimitedSplit8k

我相信你现在有了这个功能,因为我回答了你之前使用过这个功能的questions

首先,您要将@string输入拆分为单独的行。您还应该拆分Group_Words表。

之后,您执行LEFT JOIN以获取匹配的类别。然后你消除了无效的单词。

在此处查看此行动:SQL Fiddle

DECLARE @string VARCHAR(8000)
SET @string = 'No People,Day,side view,looking at camera,snow,mountain,tranquil scene,tranquility,Night,walking,water,Two Person,looking Down'

-- Split @string variable
DECLARE @tbl_string AS TABLE(ItemNumber INT, Item VARCHAR(8000))
INSERT INTO @tbl_string
SELECT
    ItemNumber, LTRIM(RTRIM(Item))
FROM dbo.DelimitedSplit8K(@string, ',')

-- Normalize Group_Words
DECLARE @tbl_grouping AS TABLE(Category VARCHAR(20), ItemNumber INT, Item VARCHAR(8000))
INSERT INTO @tbl_grouping
SELECT
    w.Category, s.ItemNumber, LTRIM(RTRIM(s.Item))
FROM Group_Words w
CROSS APPLY dbo.DelimitedSplit8K(w.[Group], ',')s

;WITH Cte AS(
    SELECT      
        s.ItemNumber,
        s.Item,
        g.category,     
        RN = ROW_NUMBER() OVER(PARTITION BY g.Category ORDER BY s.ItemNumber)
    FROM @tbl_string s
    LEFT JOIN @tbl_grouping g
        ON g.Item = s.Item
)
SELECT STUFF((
        SELECT ',' + Item
        FROM Cte
        WHERE 
            RN = 1
            OR Category IS NULL
        ORDER BY ItemNumber
        FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'),
    1, 1, '')

<强>输出

|                                                                                                  |
|--------------------------------------------------------------------------------------------------|
| No People,Day,side view,looking at camera,snow,mountain,tranquil scene,tranquility,walking,water |

如果您的@string输入字符超过8000,则DelimitedSplit8K会减慢速度。您可以使用其他分割器。以下是Aaron Bertrands爵士的article

CREATE FUNCTION dbo.SplitStrings_XML
(
   @List       NVARCHAR(MAX),
   @Delimiter  NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
   RETURN 
   (  
      SELECT Item = y.i.value('(./text())[1]', 'nvarchar(4000)')
      FROM 
      ( 
        SELECT x = CONVERT(XML, '<i>' 
          + REPLACE(@List, @Delimiter, '</i><i>') 
          + '</i>').query('.')
      ) AS a CROSS APPLY x.nodes('i') AS y(i)
   );
GO