我有两个数据集:
id_description
id description
1 The cat sat
2 The dog barked
2 The dog barked
3 The parrot
4 The dog barked
4 The dog barked
person_description
person description
John The cat sat
Jane The dog barked
James The parrot
Mary The dog barked
我需要构造一个看起来像这样的第三个数据集(以下两个选项之一):
id person description
1 John The cat sat
2 Jane The dog barked
3 James The parrot
4 Mary The dog barked
id person description
1 John The cat sat
2 Mary The dog barked
3 James The parrot
4 Jane The dog barked
我首先尝试:
SELECT distinct a.id, b.person, a.description
FROM id_description a
LEFT OUTER JOIN person_description b ON a.description = b.description
这将产生此数据集:
id person description
1 John The cat sat
2 Jane The dog barked
2 Mary The dog barked
3 James The parrot
4 Jane The dog barked
4 Mary The dog barked
由于在description
上的连接,person
可以复制两个或更多 id
个数字。我将如何到达目标数据集?
person
和id
的每个数字都代表一次,而id
附加到哪个person
则无关紧要(即2 / Jane和4 / Mary是相当于2 / Mary和4 / Jane)。我尝试使用row_number() over (partition by id order by person)
,然后通过row_number = 1
进行过滤,但结果如下:
id person description
1 John The cat sat
2 Jane The dog barked
3 James The parrot
4 Jane The dog barked
玛丽没有代表,因为简是2和4的第一行。
答案 0 :(得分:1)
如果您是由于Description上的联接而到达这里的,那么解决此问题的一种方法是获取两个row_numbers()。
两个分区都在描述上。
一个按ID排序,另一个按Person排序。
然后选择ID与Row_Numbers匹配的个人的描述。
以伪形式,应该看起来像这样:
with cte_ID AS (SELECT Description, ID, {RowNumber ordered by ID} AS ID_RN...)
, cte_Person AS (SELECT Description, Person, {RowNumber ordered by Person} AS Person_RN...)
SELECT ID, Person, Description
FROM cte_ID JOIN cte_Person ON Description=Description AND ID_RN=Person_RN
您可以通过在原始联接中添加row_numbers(按Description进行分区)到每个原始表中,然后在Description和Row_Number上联接来完成此操作。
答案 1 :(得分:0)
使用row_number窗口功能
with t as
(
select 1 as id,'John' as person,'The cat sat' as des
union all
select 2 as id,'Jane' as person,'The dog barked'
union all
select 2 as id,'Mary' ,'The dog barked'
union all
select 3 , 'James' ,'The parrot'
union all
select 4 , 'Jane' ,'The dog barked'
union all
select 4 , 'Mary' ,'The dog barked'
) select * from
(
select *,row_number() over(partition by des order by id ) rn
from t
) as t1 where t1.rn=1
union
select * from
(
select *,row_number() over(partition by des order by person ) rn
from t
) as t1 where t1.rn=1
id person des rn
1 John The cat sat 1
2 Mary The dog barked 1
3 James The parrot 1
4 Jane The dog barked 1