比较两栏的差异

时间:2019-02-12 08:27:17

标签: r dataframe

我正在尝试比较两列IDadd。使用ID作为键,如果相应的add不同,则diff应显示“是”。

df <- data.frame(ID = c("1234", "1234", "7491", "7319", "321", "321"), add = c("ABC", "DEF", "HIJ", "KLM", "WXY", "WXY"))

预期产量

    ID add diff
1 1234 ABC  Yes
2 1234 DEF  Yes
3 7491 HIJ   No
4 7319 KLM   No
5  321 WXY   No
6  321 WXY   No

3 个答案:

答案 0 :(得分:3)

使用data.table

setDT(df)
df[, diff := if (uniqueN(add) > 1) "Yes" else "No", by = ID]
df

     ID add diff
1: 1234 ABC  Yes
2: 1234 DEF  Yes
3: 7491 HIJ   No
4: 7319 KLM   No
5:  321 WXY   No
6:  321 WXY   No

答案 1 :(得分:1)

R的基本方法是:

df$diff <- sapply(df$ID, function(x) {
  s <- df$add[df$ID == x]
  length(s) != 1 & length(unique(s)) != 1
})

> df
    ID add  diff
1 1234 ABC  TRUE
2 1234 DEF  TRUE
3 7491 HIJ FALSE
4 7319 KLM FALSE
5  321 WXY FALSE
6  321 WXY FALSE

如果您要是,请ifelse(df$diff, "Yes", "No")

或者-按照 @sindri_baldur 的建议-这样做,速度更快:

unlist(sapply(unique(df$ID), function(x) {
  rows <- df$ID == x
  s <- df$add[rows]
  rep(length(s) != 1 & length(unique(s)) != 1, sum(rows)) 
}))

答案 2 :(得分:1)

您还可以使用dplyr解决方案:

library(dplyr)

df %>% 
  group_by(ID) %>% 
  mutate(diff = ifelse(length(unique(add))>1, "YES", "NO")) # n_distict(add)>1 will also work 
  #mutate(diff = ifelse(n_distinct(add)>1, "YES", "NO"))
# # A tibble: 6 x 3
# # Groups:   ID [4]
# ID    add   diff 
# <fct> <fct> <chr>
# 1 1234  ABC   YES  
# 2 1234  DEF   YES  
# 3 7491  HIJ   NO   
# 4 7319  KLM   NO   
# 5 321   WXY   NO   
# 6 321   WXY   NO