Question

我有一个类似的数据集：

 Col1    Col2    
 1        ABC 
 2        DEF
 3        ABC 
 1        DEF

预期输出：

Col1     Col2    
 1        ABC 
 1        DEF

我只想从Col1中提取在列中同时具有值ABC和DEF的那些IDS。

我在SQL中尝试过self-join，但是并没有达到预期的结果。

SELECT DISTINCT Col1
FROM db A, db B
WHERE A.ID <> B.ID
    AND A.Col2 = 'ABC'
    AND B.Col2 = 'DEF' 
GROUP BY A.Col1

此外，我尝试使用以下代码在R中进行同样的操作：

vc <- c("ABC", "DEF")
data1 <- db[db$Col2 %in% vc,]

同样，我没有得到想要的输出。感谢所有提前提供的指针。

Answer 1

在R中，您可以

library(dplyr) 
df %>% 
   group_by(Col1) %>% 
   filter(all(vc %in% Col2))

#   Col1 Col2 
#  <int> <fct>
#1     1 ABC  
#2     1 DEF

相当于Base R的

df[as.logical(with(df, ave(Col2, Col1, FUN = function(x) all(vc %in% x)))), ]

#  Col1 Col2
#1    1  ABC
#4    1  DEF

我们选择其中包含所有vc的组。

Answer 2

这是您当前的查询已更正：

SELECT DISTINCT t1.Col1
FROM yourTable t1
INNER JOIN yourTable t2
    ON t1.Col1 = t2.Col1
WHERE t1.Col2 = 'ABC' AND t2.Col2 = 'DEF';

Demo

连接条件是两个Col1值都是 same ，第一个Col2值是ABC，第二个Col2值是DEF。

但是，我可能会使用以下规范方法：

SELECT Col1
FROM yourTable
WHERE Col2 IN ('ABC', 'DEF')
GROUP BY Col1
HAVING MIN(Col2) <> MAX(Col2);

Answer 3

使用相关子查询：

select * from tablename t 
where exists (select 1 from tablename t1 where t1.col1=t.col1 and col2 in ('ABC','DEF')
group by col1 having count(distinct col2)=2)

Answer 4

这是使用group_concat的一种方式

select t.Col1,t.col2
from t
join
(
select col1,group_concat(distinct col2 order by col2) gc
from t
group by col1 having gc = 'abc,def'
) s
on s.col1 = t.col1;

+------+------+
| Col1 | col2 |
+------+------+
|    1 | ABC  |
|    1 | DEF  |
+------+------+
2 rows in set (0.16 sec)

但是您必须了解col2的顺序

Answer 5

在R中，我们也可以使用data.table

library(data.table)
setDT(df)[, .SD[all(vc %in% Col2)], by = col1]

在SQL的另一列中提取具有指定公共值的元组

5 个答案:

Demo