我有两个这样的data.frame:
#df1
ID a1 a2 a3 b1 b2 b3 Date
3xy Evan Greg Ryan Ben Bob Alex 12/3
4lm John Bill Sue Randy Mark Seth 12/5
#df2
Name
Evan
Mark
如果任何“a”列中的名称出现在df2 $ Name中,我想用NA替换所有“a”列。 “b”列相同。我想要的输出看起来像这样:
ID a1 a2 a3 b1 b2 b3 Date
3xy NA NA NA Ben Bob Alex 12/3
4lm John Bill Sue NA NA NA 12/5
我发现其他一些帖子看起来似乎是类似的主题,但我还没有找到办法让它发挥作用。我已经能够使用下面的代码替换df2中出现在df2中的名称,但是还没有弄清楚如何替换以相同字母开头的其他列:
df1[apply(df1, 2, function(df1) df1 %in% df2$Name)] <- NA
给我一个这样的输出:
ID a1 a2 a3 b1 b2 b3 Date
3xy NA Greg Ryan Ben Bob Alex 12/3
4lm John Bill Sue Randy NA Seth 12/5
我也在不断尝试不同的ifelse
语句,但没有成功。
答案 0 :(得分:2)
我们可以split
基于&#39; a&#39;和&#39;&#39;列,然后循环遍历行,如果any
与&#39; name&#39;匹配,则将行指定为NA值。 &#39; df2&#39;
nm1 <- names(df1)[c(-1, -8)]
lst <- lapply(split.default(df1[nm1], sub("\\d+", "", nm1)), function(x) {
x[apply(x, 1, function(y) any(y %in% df2$Name)),] <- NA
x})
df1[nm1] <- do.call(cbind, unname(lst))
df1
# ID a1 a2 a3 b1 b2 b3 Date
#1 3xy <NA> <NA> <NA> Ben Bob Alex 12/3
#2 4lm John Bill Sue <NA> <NA> <NA> 12/5
或其他选项melt/dcast
来自data.table
library(data.table)
dcast(melt(setDT(df1), measure = patterns("^a\\d+", "^b\\d+"),
value.name = c('a', 'b'))[, c('a', 'b') := lapply(.SD, function(x)
replace(x, any(x %in% df2$Name), NA)), ID, .SDcols = a:b][],
ID + Date ~ variable, value.var = c('a', 'b'), sep='')
# ID Date a1 a2 a3 b1 b2 b3
#1: 3xy 12/3 NA NA NA Ben Bob Alex
#2: 4lm 12/5 John Bill Sue NA NA NA
答案 1 :(得分:1)
library(tidyverse)
df3 <- df1 %>%
gather(key, value, -ID, -Date) %>%
mutate(group = substr(key, 1, 1)) %>%
select(group, ID, value) %>%
inner_join(df2, by = c("value" = "Name")) %>%
select(group, ID)
df1 %>%
gather(key, value, -ID, -Date) %>%
mutate(group = substr(key, 1, 1)) %>%
anti_join(df3) %>%
select(-group) %>%
spread(key, value) %>%
select(ID, matches("^a"), matches("^b"), Date)
输出:
# A tibble: 2 x 8
ID a1 a2 a3 b1 b2 b3 Date
* <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 3xy <NA> <NA> <NA> Ben Bob Alex 12/3
2 4lm John Bill Sue <NA> <NA> <NA> 12/5
答案 2 :(得分:0)
这是一个dplyr / tidyr方法
library(dplyr)
library(tidyr)
df1= df1%>% gather(Type, Names, -c(ID, Date)) %>%
mutate(type2 = gsub("\\d", "", Type)) %>%
group_by(type2, ID) %>%
mutate(names2 = ifelse(any(Names %in% df2$Name), "", Names),
Names = ifelse(names2 == "", NA, Names)) %>%
ungroup() %>%
select(-type2, -names2)
导致(长格式)
ID Date Type Names
<fctr> <fctr> <chr> <chr>
1 3xy 12/3 a1 <NA>
2 4lm 12/5 a1 John
3 3xy 12/3 a2 <NA>
4 4lm 12/5 a2 Bill
5 3xy 12/3 a3 <NA>
6 4lm 12/5 a3 Sue
7 3xy 12/3 b1 Ben
8 4lm 12/5 b1 <NA>
9 3xy 12/3 b2 Bob
10 4lm 12/5 b2 <NA>
11 3xy 12/3 b3 Alex
12 4lm 12/5 b3 <NA>