我有两个要连接的数据框。
第一个是:
V1 <- c("AB1", "AB2", "AB3" ,"AB4" ,"AB5" ,"AB6" ,"AB7","AB6","AB9" ,"AB10")
df1 <- data.frame(V1)
第二个是:
V5 <- c("AB1","","","", "AB3", "AB4", "AB5", "AB6")
V6 <- c("AB","AB2","","AB", "", "AB", "", "AB")
V7 <- c("AB","AB","AB","", "AB", "", "AB", "AB")
V8 <- c(1,2,2,2,3,4,5,6)
df2 <- data.frame(V5,V6, V7, V8)
我尝试在V5,V6和V7列中的df2中从df1查找V1,并从df2返回V8,并添加yes(当df $ V1在df2中时)。
所需的结果是:
V df1$V1 res df$V8 Yes/no
AB1 1 1
AB2 2 1
AB3 3 1
AB4 4 1
AB5 5 1
AB6 6 1
AB7 0
AB6 0
AB9 0
AB10 0
我有以下代码,但我不能同时使它们仅同时作用于df2中的3列?
df1$res[match(df2$V5,df1$V1, nomatch=0)] <- df2$V6[match(df2$V5,df1$V1, nomatch = 0)]
答案 0 :(得分:1)
V1 <- c("AB1", "AB2", "AB3" ,"AB4" ,"AB5" ,"AB6" ,"AB7","AB6","AB9" ,"AB10")
df1 <- data.frame(V1, stringsAsFactors = F)
V5 <- c("AB1","","","", "AB3", "AB4", "AB5", "AB6")
V6 <- c("AB","AB2","","AB", "", "AB", "", "AB")
V7 <- c("AB","AB","AB","", "AB", "", "AB", "AB")
V8 <- c(1,2,2,2,3,4,5,6)
df2 = data.frame(V5,V6,V7,V8, stringsAsFactors = F)
library(tidyverse)
df2 %>%
gather(v, V1, -V8) %>% # reshape dataset
select(-v) %>% # remove unecessary variable
right_join(df1, by="V1") %>% # join df1
mutate(YesNo = ifelse(is.na(V8), 0, 1)) %>% # create Yes/No variable
distinct() %>% # select distinct rows
select(V1, V8, YesNo) # arrange columns
# V1 V8 YesNo
# 1 AB1 1 1
# 2 AB2 2 1
# 3 AB3 3 1
# 4 AB4 4 1
# 5 AB5 5 1
# 6 AB6 6 1
# 7 AB7 NA 0
# 8 AB9 NA 0
# 9 AB10 NA 0
如果从代码中删除distinct()
,则会得到df1
的所有行(而不是不同的行)。