我有一个包含2列的电影表。 ID(int)和MetaData(XML)。 MetaData如下所示:
<movie xmlns="urn:schemas-xxx:yyy:catalog" >
<credits>
<credit creditId="15594954" creditType="Actor" >aaa</credit>
<credit creditId="15573106" creditType="Actor" >bbb</credit>
<credit creditId="15781056" creditType="Actor" >bbb</credit>
<credit creditId="15781056" creditType="Actor" >ddd</credit>
<credit creditId="15606109" creditType="Director" >ddd</credit>
<credit creditId="16316911" creditType="Art Director" >adadad</credit>
<credit creditId="18484117" creditType="Choreographer" >ch</credit>
<credit creditId="15707268" creditType="Cinematographer" >cm</credit>
<credit creditId="15907445" creditType="Screenwriter">sss</credit>
<credit creditId="15905546" creditType="Screenwriter" >ggg</credit>
<credit creditId="16493602" creditType="Editor" >eee</credit>
<credit creditId="15825749" creditType="Composer" >ccc</credit>
<credit creditId="18486706" creditType="Composer" >ddd</credit>
</credits>
</movie>
我想找到信用类型中有重复项的记录 - 这里的演员“bbb”是重复的(但“ddd”不是)。
如果我有如下的查询,它甚至会抛出记录,其中演员也是导演。但我不希望它们出现。
-- Check for Duplicate Cast and Crew
WITH XMLNAMESPACES (DEFAULT 'urn:schemas-xxx:yyy:catalog')
SELECT Count(*)
FROM Movie
WHERE Metadata.value('count(/movie/credits/credit)', 'int') <> Metadata.value('count(distinct-values(/movie/credits/credit))', 'int')
如果我像下面那样修改我的查询,它就可以了。
WITH XMLNAMESPACES (DEFAULT 'urn:schemas-xxx:yyy:catalog')
SELECT Count(*)
FROM Movie
WHERE
(
(Metadata.value('count(/movie/credits/credit[@creditType="Actor"])', 'int') <>
Metadata.value('count(distinct-values(/movie/credits/credit[@creditType="Actor"]))', 'int')
)
OR (Metadata.value('count(/movie/credits/credit[@creditType="Director"])', 'int') <>
Metadata.value('count(distinct-values(/movie/credits/credit[@creditType="Director"]))', 'int')
)
OR (Metadata.value('count(/movie/credits/credit[@creditType="Producer"])', 'int') <>
Metadata.value('count(distinctvalues(/movie/credits/credit[@creditType="Producer"]))', 'int')
)
)
但是有很多信用类型,如作曲家,编辑等,我不希望这种方式对每种信用类型都这样做。 有没有有效的方法来做到这一点?
更新
我发现之前的查询做了区分大小写的搜索。我需要一个不区分大小写的,所以改变它如下所示:
WITH XMLNAMESPACES (DEFAULT 'urn:xxx:yyy:catalog')
SELECT Count(*) FROM
(
SELECT ID
FROM Movie
CROSS APPLY
Movie.Metadata.nodes('/movie/credits/credit[@creditType="Actor"]') x(y)
GROUP BY ID
HAVING
COUNT(y.value('.', 'varchar(100)')) <> COUNT(Distinct y.value('.', 'varchar(100)'))
) AS temp;
但我原来的问题仍然存在。
答案 0 :(得分:1)
您可以使用FLOWER并检查@creditType
的每个不同值的计数。返回一个虚节点,使用exist()
检查节点是否存在。
with xmlnamespaces(default 'urn:schemas-xxx:yyy:catalog')
select count(*)
from Movie as M
where M.Metadata.exist('
for $creditType in distinct-values(/movie/credits/credit/@creditType)
where count(distinct-values(/movie/credits/credit[@creditType = $creditType]/text())) != count(/movie/credits/credit[@creditType = $creditType]/text())
return <X/>') = 1;