我有一个带有重复ID的数据框,看起来像这样
+-----+------+------------------+
| ID + Name + other columns....|
+-----+------+------------------+
| 1 + AAA + |
| 1 + BBB + |
| 2 + ABA + |
| 2 + ACA + |
| 2 + CCC + |
| 3 + DDD + |
| 4 + EEE + |
| 4 + EEE + |
| 4 + FFF + |
| . + + |
+-----+------+------------------+
我想找到名称列中具有不同值的重复ID。 我可以找到重复的ID,但我想比较列" Name"在基于相等ID的相同数据帧中。
答案 0 :(得分:1)
以下是使用dplyr
的解决方案。
library(dplyr)
df %>%
group_by(ID) %>%
filter(n() > 1) %>% # select only duplicated rows
mutate(Unique_Name = n_distinct(Name)) %>% # number of distinct Name values
filter(Unique_Name != 1) # select rows that have not unique Name values
# or just
df %>%
group_by(ID) %>%
filter(n() > 1) %>% # select only duplicated rows
filter(n_distinct(Name) != 1) # select rows that have not unique Name values
# Data
df <- structure(list(ID = c(1L, 1L, 2L, 2L, 2L, 3L, 4L, 4L), Name = structure(c(1L,
4L, 2L, 3L, 5L, 6L, 7L, 7L), .Label = c("AAA", "ABA", "ACA",
"BBB", "CCC", "DDD", "EEE"), class = "factor")), .Names = c("ID",
"Name"), class = "data.frame", row.names = c(NA, -8L))
答案 1 :(得分:0)
我们可以尝试
names(which(rowSums(table(df1[1:2]) != 0) == 1))
目前尚不清楚逻辑是否要找到所有unique
'名称'的ID。如果是这种情况
library(dplyr)
df1 %>%
group_by(ID) %>%
filter(n_distinct(Name)== n()) %>%
pull(ID) %>%
unique
答案 2 :(得分:0)
这将为您提供一个新列,其中TRUE是具有重复ID和相同名称的行:
df=tibble(ID=c(1,1,2,2,2,3,4,4,4),Name=c("AAA","BBB","ABA","ACA","CCC","DDD","EEE","EEE","FFF"))
df0=df%>%group_by(ID)%>%mutate(x=duplicated(Name))
您当前的df仅在第8行中显示为True(ID == 4&amp; Name == EEE)
ID Name x
<dbl> <chr> <lgl>
1 1.00 AAA F
2 1.00 BBB F
3 2.00 ABA F
4 2.00 ACA F
5 2.00 CCC F
6 3.00 ABA F
7 4.00 EEE F
8 4.00 EEE T
9 4.00 FFF F
如果您将df更改为具有相同ID的其他匹配名称(&#39; ABA&#39;):
df=tibble(ID=c(1,1,2,2,2,3,4,4,4),Name=c("AAA","BBB","ABA","ABA","CCC","DDD","EEE","EEE","FFF"))
你将得到更多的真理:
ID Name x
<dbl> <chr> <lgl>
1 1.00 AAA F
2 1.00 BBB F
3 2.00 ABA F
4 2.00 ABA T
5 2.00 CCC F
6 3.00 DDD F
7 4.00 EEE F
8 4.00 EEE T
9 4.00 FFF F
但是,如果差异ID显示相同的名称:
df=tibble(ID=c(1,1,2,2,2,3,4,4,4),Name=c("AAA","BBB","ABA","ACA","CCC","ABA","EEE","EEE","FFF"))
没有新的匹配:
ID Name x
<dbl> <chr> <lgl>
1 1.00 AAA F
2 1.00 BBB F
3 2.00 ABA F
4 2.00 ACA F
5 2.00 CCC F
6 3.00 ABA F
7 4.00 EEE F
8 4.00 EEE T
9 4.00 FFF F