确定最短“文本”字段中的文本是否包含在同一“ID”的其他行的所有其他“文本”字段中

时间:2017-11-09 15:28:51

标签: sql sql-server tsql

我喜欢以下数据集:

ID   Text
1    The
1    The Quick
1    The Quick Brown
2    Fox
2    Fox
3    Jumps
3    Over The
3    Over The L
3    Over The Lazy

我需要有代码告诉我最短'Text'字段中的文本是否包含在同一'ID'的所有其他字段中。

输出应如下所示:

ID   Text              Result
1    The               Yes
1    The Quick         Yes
1    The Quick Brown   Yes
2    Fox               Yes
2    Fox               Yes
3    Jumps             No
3    Over The          No
3    Over The L        No
3    Over The Lazy     No

实际上,有些文本字段非常长,而且这些数据将占据数千行的数据。

感谢您的帮助!

3 个答案:

答案 0 :(得分:3)

好的,我们走了。 首先,确定每个值的长度,并确定每个具有最短长度的ID的值(或者只是用qualify语句标识它)。

select ID, text as shortest_text, length(text) as txt_length from table
qualify row_number() over(partition by ID order by txT_length)=1

OR

select ID, text as shortest_text 
from table z
inner join (
select ID, max(txt_length) as max_length from (
select ID, text, length(text) as txt_length from table ) x)y
on z.Id=y.ID and length(z.text)=y.txt_length

然后你必须基本将它连接到原始表,并查看该值是否包含在每个ID的不同行中。

select ID, text, shortest_text, case when text like '%'+shortest_text+'%' then 1 else 0 end as contains_shortest_text
from table a
left join ([previous query, if you use Teradata I recommend the first option]) b
on a.ID=b.ID

最后,您将确定每个ID的所有行是否包含该值。

select ID, text, min(contains_shortest_text) over(partition by ID) as all_contain_shortest_text
from ([previous query]) x

答案 1 :(得分:2)

试试这个:

;with shortestPerID as (
    select A.ID as ID, A.Text as Text from
        (select ID, Text, MAX(Len(Text)) as Length from TABLE_NAME group by ID, Text) as A
    right join
        (select ID, min(len(Text)) as Length from TABLE_NAME group by ID) as B
    on (A.ID = B.ID and A.Length = B.Length)
)    

select A.ID, A.Text, 
       case when B.Text is null then 'No' else 'Yes' end as Result
from TABLE_NAME as A left join shortestPerID as B
ON (A.ID = B.ID and charindex(B.Text, A.Text) > 0)

首先,CTE是使用每个ID最短文本的表。然后简单的连接就可以了:)

试试这个SQL fiddle

这是结果,它是不同的:对于ID = 3和Text ='Jumps',它表示是,因为'Jumps'本身包含:)

ID   Text              Result
1    The               Yes
1    The Quick         Yes
1    The Quick Brown   Yes
2    Fox               Yes
2    Fox               Yes
3    Jumps             Yes
3    Over The          No
3    Over The L        No
3    Over The Lazy     No

修改

正如@JohnRC建议的那样,'No'获得最短的Text,您可以在JOIN声明中使用其他select条件:

on (A.ID = B.ID and A.Length = B.Length and B.Text <> A.Text)

答案 2 :(得分:0)

以下是我的建议,使用临时表(SQL Server 2014):

        -- Get the minimum text length and number of entries per ID 
        SELECT ID
            , Min(LEN(text)) MINTL
            , Count(ID) IDCount 
            INTO #MINTL 
            FROM TABLE_NAME 
            GROUP BY ID;

        -- Get all the minimum text(s) and entry count per ID
        SELECT DISTINCT X.* 
            INTO #MINTEXT 
            FROM (
                SELECT A.ID, A.text MinText, B.IDCount 
                    FROM TABLE_NAME A
                    JOIN #MINTL B
                        ON A.ID = B.ID AND LEN(A.Text) = B.MinTL 
                ) X;

        -- Count the matches for each minimum text per ID
        SELECT A.ID
             , A.MinTEXT
             , A.IDCount
             , COUNT(B.ID) MatchCount
            INTO #MATCHES
            FROM #MINTEXT A
                JOIN TABLE_NAME B
                    ON A.ID = B.ID 
                    AND charindex(A.MinText, B.Text) > 0  
            GROUP BY A.ID
                   , A.MinText
                   , A.IDCount;

        -- Produce the result, only say Yes if the number of matches = the entry count
        SELECT A.ID
             , A.Text
             , CASE 
                   WHEN B.IDCOUNT > B.MatchCount THEN 'No' 
                   ELSE 'Yes' END Result
            FROM TABLE_NAME A
            JOIN #MATCHES B 
                ON A.ID = B.ID
            ORDER BY A.ID, A.Text;

        DROP TABLE #MINTL;
        DROP TABLE #MINTEXT;
        DROP TABLE #MATCHES;

使用与问题相同的输入生成此结果

ID          Text                                               Result
----------- -------------------------------------------------- ------
1           The                                                Yes
1           The Quick                                          Yes
1           The Quick Brown                                    Yes
2           Fox                                                Yes
2           Fox                                                Yes
3           Jumps                                              No
3           Over The                                           No
3           Over The L                                         No
3           Over The Lazy                                      No