根据两个列表中子集的匹配值,将值分配给列表中数据框的子集

时间:2019-07-24 09:38:43

标签: r dataframe

我有两个列表(列表中的数据框包含的列多于两个,但对我的问题而言并不重要):

KPI_new <- list(June=data.frame(ID=(rep("",17)), eRec= c("107349", "110878", "110024", "112188", "6187", "100420", "94436", "110165", "108508", "108773", "111859", "111907", "110704", "100413", "88995", "91644","111298") ))


KPI_old <- list(May=data.frame(ID=c(27, 30,  4,  6,  7, 20, 31,  8, 28, 25, 29, 16, 17, 18), eRec = c( "107349", "110024", "6187"  , "100420", "94436",  "88995" , "110165" ,"91644",  "108508", "105213", "108773", "102636" ,"102339" ,"100413")),
            April = data.frame(ID=c(26, 27,  2,  4,  5,  6,  7, 20, 21, 22,  8, 23, 28, 25, 29,  9, 24, 16, 17, 18), eRec=c("37866",  "107349", "93051",  "6187",   "98274",  "100420", "94436",  "88995"  ,"105107", "105109", "91644",  "105103" ,"108508" ,"105213", "108773", "85409"  ,"104145","102636" ,"102339" ,"100413")),
            March = data.frame(ID= c(2, 19,  4,  5,  6,  7, 20, 21, 22,  8, 23, 25,  9, 24, 15, 16, 17, 18), eRec=c("93051" , "104499" ,"6187",   "98274",  "100420" ,"94436",  "88995"  ,"105107" ,"105109", "91644"  ,"105103", "105213" ,"85409" , "104145", "100989", "102636" ,"102339", "100413")),
            February = data.frame(ID= c(1 , 2, 19,  4,  5,  6,  7 ,20, 21, 22,  8, 23,  9 ,10, 24, 12, 13, 14, 15, 16, 17, 18), eRec=c("94266" , "93051",  "104499" ,"6187" ,  "98274",  "100420", "94436"  ,"88995",  "105107", "105109", "91644"  ,"105103", "85409"  ,"102252", "104145", "94559",  "101426", "100992" ,"100989" ,"102636" ,"102339" ,"100413")),
            January = data.frame(ID = c(1:18), eRec=c("94266" , "93051",  "99836",  "6187" ,  "98274",  "100420", "94436",  "91644",  "85409",  "102252", "94412",  "94559",  "101426", "100992", "100989", "102636", "102339", "100413")))

列表KPI_old包含几个数据框。根据eRec列分配ID列。因此,如果eRec列在1月和2月也存在,则ID是相同的。 现在,我想基于KPI_newKPI_old列表中数据框的ID列(此时为空)分配ID。

我尝试了以下操作:

KPI_old_df <- do.call("rbind", KPI_old)
KPI_new[[1]]$ID[(KPI_new[[1]][,2]) %in% KPI_old_df[,2]] <- unique(KPI_old_df$ID[(KPI_old_df[,2]) %in% KPI_new[[1]][,2]])

这将分配正确的值-已经在KPI_old中出现的KPI_new中的eRec值的KPI_old到KPI_new的ID-但它会将其中一些分配给错误的行。顺序不正确。 似乎我缺少一些非常基本的东西。

谢谢。

1 个答案:

答案 0 :(得分:0)

尝试通过以下方式使用match

KPI_new[[1]]$ID <- KPI_old_df$ID[match(KPI_new[[1]]$eRec, KPI_old_df$eRec)]

KPI_new
#$June
#   ID   eRec
#1  27 107349
#2  NA 110878
#3  30 110024
#4  NA 112188
#5   4   6187
#6   6 100420
#7   7  94436
#8  31 110165
#9  28 108508
#10 29 108773
#11 NA 111859
#12 NA 111907
#13 NA 110704
#14 18 100413
#15 20  88995
#16  8  91644
#17 NA 111298

并非所有ID都存在于KPI_old_df中,因此其中一些返回NA