我喜欢以下数据集:
ID Text
1 The
1 The Quick
1 The Quick Brown
2 Fox
2 Fox
3 Jumps
3 Over The
3 Over The L
3 Over The Lazy
我需要有代码告诉我最短'Text'字段中的文本是否包含在同一'ID'的所有其他字段中。
输出应如下所示:
ID Text Result
1 The Yes
1 The Quick Yes
1 The Quick Brown Yes
2 Fox Yes
2 Fox Yes
3 Jumps No
3 Over The No
3 Over The L No
3 Over The Lazy No
实际上,有些文本字段非常长,而且这些数据将占据数千行的数据。
感谢您的帮助!
答案 0 :(得分:3)
好的,我们走了。 首先,确定每个值的长度,并确定每个具有最短长度的ID的值(或者只是用qualify语句标识它)。
select ID, text as shortest_text, length(text) as txt_length from table
qualify row_number() over(partition by ID order by txT_length)=1
OR
select ID, text as shortest_text
from table z
inner join (
select ID, max(txt_length) as max_length from (
select ID, text, length(text) as txt_length from table ) x)y
on z.Id=y.ID and length(z.text)=y.txt_length
然后你必须基本将它连接到原始表,并查看该值是否包含在每个ID的不同行中。
select ID, text, shortest_text, case when text like '%'+shortest_text+'%' then 1 else 0 end as contains_shortest_text
from table a
left join ([previous query, if you use Teradata I recommend the first option]) b
on a.ID=b.ID
最后,您将确定每个ID的所有行是否包含该值。
select ID, text, min(contains_shortest_text) over(partition by ID) as all_contain_shortest_text
from ([previous query]) x
答案 1 :(得分:2)
试试这个:
;with shortestPerID as (
select A.ID as ID, A.Text as Text from
(select ID, Text, MAX(Len(Text)) as Length from TABLE_NAME group by ID, Text) as A
right join
(select ID, min(len(Text)) as Length from TABLE_NAME group by ID) as B
on (A.ID = B.ID and A.Length = B.Length)
)
select A.ID, A.Text,
case when B.Text is null then 'No' else 'Yes' end as Result
from TABLE_NAME as A left join shortestPerID as B
ON (A.ID = B.ID and charindex(B.Text, A.Text) > 0)
首先,CTE
是使用每个ID最短文本的表。然后简单的连接就可以了:)
试试这个SQL fiddle。
这是结果,它是不同的:对于ID = 3和Text ='Jumps',它表示是,因为'Jumps'本身包含:)
ID Text Result
1 The Yes
1 The Quick Yes
1 The Quick Brown Yes
2 Fox Yes
2 Fox Yes
3 Jumps Yes
3 Over The No
3 Over The L No
3 Over The Lazy No
修改强>
正如@JohnRC建议的那样,'No'
获得最短的Text
,您可以在JOIN
声明中使用其他select
条件:
on (A.ID = B.ID and A.Length = B.Length and B.Text <> A.Text)
答案 2 :(得分:0)
以下是我的建议,使用临时表(SQL Server 2014):
-- Get the minimum text length and number of entries per ID
SELECT ID
, Min(LEN(text)) MINTL
, Count(ID) IDCount
INTO #MINTL
FROM TABLE_NAME
GROUP BY ID;
-- Get all the minimum text(s) and entry count per ID
SELECT DISTINCT X.*
INTO #MINTEXT
FROM (
SELECT A.ID, A.text MinText, B.IDCount
FROM TABLE_NAME A
JOIN #MINTL B
ON A.ID = B.ID AND LEN(A.Text) = B.MinTL
) X;
-- Count the matches for each minimum text per ID
SELECT A.ID
, A.MinTEXT
, A.IDCount
, COUNT(B.ID) MatchCount
INTO #MATCHES
FROM #MINTEXT A
JOIN TABLE_NAME B
ON A.ID = B.ID
AND charindex(A.MinText, B.Text) > 0
GROUP BY A.ID
, A.MinText
, A.IDCount;
-- Produce the result, only say Yes if the number of matches = the entry count
SELECT A.ID
, A.Text
, CASE
WHEN B.IDCOUNT > B.MatchCount THEN 'No'
ELSE 'Yes' END Result
FROM TABLE_NAME A
JOIN #MATCHES B
ON A.ID = B.ID
ORDER BY A.ID, A.Text;
DROP TABLE #MINTL;
DROP TABLE #MINTEXT;
DROP TABLE #MATCHES;
使用与问题相同的输入生成此结果
ID Text Result
----------- -------------------------------------------------- ------
1 The Yes
1 The Quick Yes
1 The Quick Brown Yes
2 Fox Yes
2 Fox Yes
3 Jumps No
3 Over The No
3 Over The L No
3 Over The Lazy No