我有两个数据框,每个数据框都包含标识符。
df1 <- data.frame(ID = c(20001, 20001, 20003, 20003, 20003, 20003))
df2 <- data.frame(ID = c(20001, 20001, 20003, 20003, 20003, 20005),
Type = c('N1', 'N2', 'N3', 'N4', 'N5', 'N6'))
我想在df1中创建第二列,其值为df2 $ Type,方法是匹配ID。这是我查找值的常用方法
df1$Add <- df2$Type[match(df1$ID, df2$ID)]
然而,使用这种匹配方法将获得第一个ID匹配并带来相应的值,给我这样的东西
ID Add
20001 N1
20001 N1
20003 N3
20003 N3
20003 N3
20003 N3
相反,我想为每个重复的ID带来'next'类型值,主要是在循环上。理想情况下,我想跟随输出。
ID Add
20001 N1
20001 N2
20003 N3
20003 N4
20003 N5
20003 N3
我认为它可能需要使用一些可能是用户定义函数的lapply。
答案 0 :(得分:2)
这就是你要找的东西吗?
library(dplyr)
df1 %>% group_by(ID) %>%
mutate(c = rep(df2$Type[df2$ID == unique(ID)], length.out = n()))
# ID c
#1 20001 N1
#2 20001 N2
#3 20003 N3
#4 20003 N4
#5 20003 N5
#6 20003 N3
# incase of efficiency,
library(data.table)
setDT(df2)
setDT(df1)[, x := rep(df2$Type[df2$ID == ID], length.out = .N),by = .(ID)]
# i'm looking for a base R solution without involving merge
# as of now my bet is on sapply() - but not utilised efficiently
unlist(sapply(unique(df1$ID), function(x) rep(df2$Type[df2$ID == x],
length.out = sum(x==df1$ID))))
# [1] N1 N2 N3 N4 N5 N3