我有一个带有三列的DT,并且前两个具有按组分组的各种值。
ID ID_2 Group
23201600101793 2016052016051062331 A
23201600101793 2016062016061017838 A
23201600101794 2016052016051062331 A
23201600101794 2016052016051062402 A
23201600103090 2016052016051062325 A
23201600103090 2016052016051062408 A
23201600803366 2016052016051062325 A
23201600803366 2016052016051062408 A
我需要找到两个列的唯一组合,而不是任何列中的重复值。我的愿望输出是针对A组的
ID ID_2 Group
23201600101793 2016052016051062331 A
23201600101794 2016052016051062402 A
23201600103090 2016052016051062325 A
23201600803366 2016052016051062408 A
第3行和第7行被删除,因为它们分别在第1行和第5行的第ID_2列中有重复值。删除了第2,4,6和8行,因为它们重复了第1,3,5,7行中列ID的值。
没有按组分组的模式,它们可以有许多具有相同ID或ID_2的行。
例如,从B组我只需要2行,因为ID有两个唯一值。选定的行可以是第一行(我的意思是,所有ID_2行,但第一行将被丢弃,因为第一行有两个唯一值)
ID ID_2 Group
23201600009182 2016042016041000942 B
23201600009182 2016042016041000943 B
23201600009182 2016042016041000946 B
23201600009182 2016042016041000949 B
23201600009182 2016042016041000950 B
23201600009182 2016042016041000951 B
23201600009182 2016042016041000953 B
23201600009182 2016042016041000954 B
23201600009182 2016042016041000956 B
23201600009182 2016042016041000957 B
23201600009182 2016042016041000958 B
23201600669635 2016052016051003624 B
23201600669635 2016052016051003626 B
23201600669635 2016052016051003628 B
23201600669753 2016012016011000791 B
23201600669753 2016012016011000797 B
B组的期望输出
23201600009182 2016042016041000942 B
23201600669635 2016052016051003624 B
我感谢任何帮助。
答案 0 :(得分:0)
根据我的理解,您需要Group
& ID
是唯一的。
您可以在dplyr中使用distinict
:
library(dplyr)
#sample data
set.seed(123)
sample_data <- tibble(ID = sample(1:4,size = 10,replace = T),
ID2 = sample(1:4,size = 10,replace = T),
group = sample(c("A","B"),size = 10,replace = T))
示例数据:
> sample_data
# A tibble: 10 x 3
ID ID2 group
<int> <int> <chr>
1 2 4 B
2 4 2 B
3 2 3 B
4 4 3 B
5 4 1 B
6 1 4 B
7 3 1 B
8 4 1 B
9 3 2 A
10 2 4 A
#sample result
distinct(sample_data,ID,group,.keep_all=T)
样本结果:
# A tibble: 6 x 3
ID ID2 group
<int> <int> <chr>
1 2 4 B
2 4 2 B
3 1 4 B
4 3 1 B
5 3 2 A
6 2 4 A