如果value在另一个数据帧中,请用NA替换多个列

时间:2017-12-07 16:44:03

标签: r

我有两个这样的data.frame:

#df1
ID     a1      a2     a3      b1      b2      b3     Date
3xy    Evan    Greg   Ryan   Ben      Bob     Alex   12/3
4lm    John    Bill   Sue    Randy    Mark    Seth   12/5

#df2
Name
Evan
Mark

如果任何“a”列中的名称出现在df2 $ Name中,我想用NA替换所有“a”列。 “b”列相同。我想要的输出看起来像这样:

ID     a1      a2     a3      b1      b2      b3     Date
3xy    NA      NA     NA     Ben      Bob     Alex   12/3
4lm    John    Bill   Sue    NA       NA      NA     12/5

我发现其他一些帖子看起来似乎是类似的主题,但我还没有找到办法让它发挥作用。我已经能够使用下面的代码替换df2中出现在df2中的名称,但是还没有弄清楚如何替换以相同字母开头的其他列:

df1[apply(df1, 2, function(df1) df1 %in% df2$Name)] <- NA

给我一​​个这样的输出:

ID     a1      a2     a3      b1      b2      b3     Date
3xy    NA      Greg   Ryan   Ben      Bob     Alex   12/3
4lm    John    Bill   Sue    Randy    NA      Seth   12/5

我也在不断尝试不同的ifelse语句,但没有成功。

3 个答案:

答案 0 :(得分:2)

我们可以split基于&#39; a&#39;和&#39;&#39;列,然后循环遍历行,如果any与&#39; name&#39;匹配,则将行指定为NA值。 &#39; df2&#39;

的列
nm1 <- names(df1)[c(-1, -8)]
lst <- lapply(split.default(df1[nm1], sub("\\d+", "", nm1)), function(x) {
         x[apply(x, 1, function(y) any(y %in% df2$Name)),] <- NA
     x})
df1[nm1] <- do.call(cbind, unname(lst))
df1
#   ID   a1   a2   a3   b1   b2   b3 Date
#1 3xy <NA> <NA> <NA>  Ben  Bob Alex 12/3
#2 4lm John Bill  Sue <NA> <NA> <NA> 12/5

或其他选项melt/dcast来自data.table

library(data.table)
dcast(melt(setDT(df1), measure = patterns("^a\\d+", "^b\\d+"),
    value.name = c('a', 'b'))[, c('a', 'b') := lapply(.SD, function(x) 
  replace(x, any(x %in% df2$Name), NA)), ID, .SDcols = a:b][],
        ID + Date ~ variable, value.var = c('a', 'b'), sep='')
#    ID Date   a1   a2  a3  b1  b2   b3
#1: 3xy 12/3   NA   NA  NA Ben Bob Alex
#2: 4lm 12/5 John Bill Sue  NA  NA   NA

答案 1 :(得分:1)

library(tidyverse)
df3 <- df1 %>%
  gather(key, value, -ID, -Date) %>%
  mutate(group = substr(key, 1, 1)) %>%
  select(group, ID, value) %>%
  inner_join(df2, by = c("value" = "Name")) %>%
  select(group, ID)

df1 %>%
  gather(key, value, -ID, -Date) %>%
  mutate(group = substr(key, 1, 1)) %>%
  anti_join(df3) %>%
  select(-group) %>%
  spread(key, value) %>%
  select(ID, matches("^a"), matches("^b"), Date)

输出:

# A tibble: 2 x 8
     ID    a1    a2    a3    b1    b2    b3  Date
* <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1   3xy  <NA>  <NA>  <NA>   Ben   Bob  Alex  12/3
2   4lm  John  Bill   Sue  <NA>  <NA>  <NA>  12/5

答案 2 :(得分:0)

这是一个dplyr / tidyr方法

library(dplyr)
library(tidyr)

df1= df1%>% gather(Type, Names, -c(ID, Date)) %>%
  mutate(type2 = gsub("\\d", "", Type)) %>%
  group_by(type2, ID) %>%
  mutate(names2 = ifelse(any(Names %in% df2$Name), "", Names),
         Names = ifelse(names2 == "", NA, Names)) %>%
  ungroup() %>%
  select(-type2, -names2) 

导致(长格式)

       ID   Date  Type Names
   <fctr> <fctr> <chr> <chr>
 1  3xy     12/3    a1  <NA>
 2  4lm     12/5    a1  John
 3  3xy     12/3    a2  <NA>
 4  4lm     12/5    a2  Bill
 5  3xy     12/3    a3  <NA>
 6  4lm     12/5    a3   Sue
 7  3xy     12/3    b1   Ben
 8  4lm     12/5    b1  <NA>
 9  3xy     12/3    b2   Bob
10  4lm     12/5    b2  <NA>
11  3xy     12/3    b3  Alex
12  4lm     12/5    b3  <NA>