我正在尝试比较两列ID
和add
。使用ID作为键,如果相应的add
不同,则diff
应显示“是”。
df <- data.frame(ID = c("1234", "1234", "7491", "7319", "321", "321"), add = c("ABC", "DEF", "HIJ", "KLM", "WXY", "WXY"))
预期产量
ID add diff
1 1234 ABC Yes
2 1234 DEF Yes
3 7491 HIJ No
4 7319 KLM No
5 321 WXY No
6 321 WXY No
答案 0 :(得分:3)
使用data.table
:
setDT(df)
df[, diff := if (uniqueN(add) > 1) "Yes" else "No", by = ID]
df
ID add diff
1: 1234 ABC Yes
2: 1234 DEF Yes
3: 7491 HIJ No
4: 7319 KLM No
5: 321 WXY No
6: 321 WXY No
答案 1 :(得分:1)
R的基本方法是:
df$diff <- sapply(df$ID, function(x) {
s <- df$add[df$ID == x]
length(s) != 1 & length(unique(s)) != 1
})
> df
ID add diff
1 1234 ABC TRUE
2 1234 DEF TRUE
3 7491 HIJ FALSE
4 7319 KLM FALSE
5 321 WXY FALSE
6 321 WXY FALSE
如果您要是,请ifelse(df$diff, "Yes", "No")
。
或者-按照 @sindri_baldur 的建议-这样做,速度更快:
unlist(sapply(unique(df$ID), function(x) {
rows <- df$ID == x
s <- df$add[rows]
rep(length(s) != 1 & length(unique(s)) != 1, sum(rows))
}))
答案 2 :(得分:1)
您还可以使用dplyr
解决方案:
library(dplyr)
df %>%
group_by(ID) %>%
mutate(diff = ifelse(length(unique(add))>1, "YES", "NO")) # n_distict(add)>1 will also work
#mutate(diff = ifelse(n_distinct(add)>1, "YES", "NO"))
# # A tibble: 6 x 3
# # Groups: ID [4]
# ID add diff
# <fct> <fct> <chr>
# 1 1234 ABC YES
# 2 1234 DEF YES
# 3 7491 HIJ NO
# 4 7319 KLM NO
# 5 321 WXY NO
# 6 321 WXY NO