通过匹配作为列

时间:2018-06-01 03:56:42

标签: r dataframe match

我有两个df。它们的行数不同,但有一个共同的列lepsp

set.seed(571) 
year = as.factor(c(rep("1998", 20), rep("1999", 16)))
lepsp = c(letters[1:20], c('a','b','c'),letters[8:20]) 
freq = rpois(36, lambda=12)
df1 <- data.frame(year, lepsp, freq)

lepsp = c(letters[1:26],c('a','b','c'),letters[1:20],c('e','f',"h")) 
plntsp = c(paste("plnt", sep= "_", letters[1:26]), 
      paste("plnt",sep="_",letters[1:20]),
      paste("plnt",sep="_",letters[18:23])) 
df2 <- data.frame(lepsp, plntsp)

我希望在两个数据框中匹配lepsp,并向df1添加一列,指定与每个plntsp相关联的每个lepsp。每个唯一的plntsp都需要合并为一个新列。如果没有关联的工厂,那么这些条目可以留空。新的df应如下所示:

df <- data.frame(lepsp=unique(c(letters[1:5],letters[14:18])),  
          plntsp1=c("","","plnt_a","plnt_b","plnt_c","","","","",""),
          plntsp2=c("","","", "plnt_c","plnt_d","","","","",""))

我过去曾使用过这个来进行匹配,但我不确定如何调整它以便将plntsp的每个级别添加为新列。

 df1$plntsp<-df2$plntsp[match(df1$lepsp, df2$lepsp)]

1 个答案:

答案 0 :(得分:0)

您可以使用df1加入df2dplyr::left_join。汇总leapsp的数据。最后,在多列中使用splitstackshape::cSplit单独的plntsp

library(tidyverse)
library(splitstackshape)

left_join(df1, df2, by="lepsp") %>%
  select(lepsp, plntsp) %>%
  distinct() %>%
  group_by(lepsp) %>%
  summarise(plntsp = toString(plntsp)) %>%
  ungroup() %>%
  cSplit("plntsp")

#    lepsp plntsp_1 plntsp_2 plntsp_3
# 1:     a   plnt_a   plnt_d       NA
# 2:     b   plnt_b   plnt_e       NA
# 3:     c   plnt_c   plnt_f       NA
# 4:     d   plnt_d   plnt_g       NA
# 5:     e   plnt_e   plnt_h   plnt_u
# 6:     f   plnt_f   plnt_i   plnt_v
# 7:     g   plnt_g   plnt_j       NA
# 8:     h   plnt_h   plnt_k   plnt_w
# 9:     i   plnt_i   plnt_l       NA
# 10:     j   plnt_j   plnt_m       NA
# 11:     k   plnt_k   plnt_n       NA
# 12:     l   plnt_l   plnt_o       NA
# 13:     m   plnt_m   plnt_p       NA
# 14:     n   plnt_n   plnt_q       NA
# 15:     o   plnt_o   plnt_r       NA
# 16:     p   plnt_p   plnt_s       NA
# 17:     q   plnt_q   plnt_t       NA
# 18:     r   plnt_r       NA       NA
# 19:     s   plnt_s       NA       NA
# 20:     t   plnt_t       NA       NA

注意:请在创建数据框时使用stringsAsFactors = FALSE参数,以避免不必要的警告。