在循环内匹配行数不同的两个数据框

时间:2018-12-29 20:12:39

标签: r

我认为没有数据我的问题也不难解决。我想为此在R中执行一个循环,而不是手动执行,因为答案并不总是34次迭代,它可能会更多或更少。

数据框“价格”绝对是巨大的(大约4百万行乘20列),但是没有变化。

price

数据帧“ ihbm_hf”要小得多,为34行乘20列,但根据其他计算会更改。

ihbm_hf

两个数据帧的ID均为GVKEY,但我只想保留“ ihbm_hf”中包含的GVKEY的价格,并将其存储在phbm_hfm1中

这是我要自动执行的任务。

 phbm_hf1 <- price[which(ihbm_hf$gvkey[1] == price$gvkey),]
    phbm_hf2 <- price[which(ihbm_hf$gvkey[2] == price$gvkey),]
    phbm_hf3 <- price[which(ihbm_hf$gvkey[3] == price$gvkey),]
    phbm_hf4 <- price[which(ihbm_hf$gvkey[4] == price$gvkey),]
    phbm_hf5 <- price[which(ihbm_hf$gvkey[5] == price$gvkey),]
    phbm_hf6 <- price[which(ihbm_hf$gvkey[6] == price$gvkey),]
    phbm_hf7 <- price[which(ihbm_hf$gvkey[7] == price$gvkey),]
    phbm_hf8 <- price[which(ihbm_hf$gvkey[8] == price$gvkey),]
    phbm_hf9 <- price[which(ihbm_hf$gvkey[9] == price$gvkey),]
    phbm_hf10 <- price[which(ihbm_hf$gvkey[10] == price$gvkey),]
    phbm_hf11 <- price[which(ihbm_hf$gvkey[11] == price$gvkey),]
    phbm_hf12 <- price[which(ihbm_hf$gvkey[12] == price$gvkey),]
    phbm_hf13 <- price[which(ihbm_hf$gvkey[13] == price$gvkey),]
    phbm_hf14 <- price[which(ihbm_hf$gvkey[14] == price$gvkey),]
    phbm_hf15 <- price[which(ihbm_hf$gvkey[15] == price$gvkey),]
    phbm_hf16 <- price[which(ihbm_hf$gvkey[16] == price$gvkey),]
    phbm_hf17 <- price[which(ihbm_hf$gvkey[17] == price$gvkey),]
    phbm_hf18 <- price[which(ihbm_hf$gvkey[18] == price$gvkey),]
    phbm_hf19 <- price[which(ihbm_hf$gvkey[19] == price$gvkey),]
    phbm_hf20 <- price[which(ihbm_hf$gvkey[20] == price$gvkey),]
    phbm_hf21 <- price[which(ihbm_hf$gvkey[21] == price$gvkey),]
    phbm_hf22 <- price[which(ihbm_hf$gvkey[22] == price$gvkey),]
    phbm_hf23 <- price[which(ihbm_hf$gvkey[23] == price$gvkey),]
    phbm_hf24 <- price[which(ihbm_hf$gvkey[24] == price$gvkey),]
    phbm_hf25 <- price[which(ihbm_hf$gvkey[25] == price$gvkey),]
    phbm_hf26 <- price[which(ihbm_hf$gvkey[26] == price$gvkey),]
    phbm_hf27 <- price[which(ihbm_hf$gvkey[27] == price$gvkey),]
    phbm_hf28 <- price[which(ihbm_hf$gvkey[28] == price$gvkey),]
    phbm_hf29 <- price[which(ihbm_hf$gvkey[29] == price$gvkey),]
    phbm_hf30 <- price[which(ihbm_hf$gvkey[30] == price$gvkey),]
    phbm_hf31 <- price[which(ihbm_hf$gvkey[31] == price$gvkey),]
    phbm_hf32 <- price[which(ihbm_hf$gvkey[32] == price$gvkey),]
    phbm_hf33 <- price[which(ihbm_hf$gvkey[33] == price$gvkey),]
    phbm_hf34 <- price[which(ihbm_hf$gvkey[34] == price$gvkey),]

    phbm_hf <- rbind(phbm_hf1, phbm_hf2, phbm_hf3, phbm_hf4, 
               phbm_hf5, phbm_hf6, phbm_hf7, phbm_hf8, 
               phbm_hf9, phbm_hf10, phbm_hf11, phbm_hf12, 
               phbm_hf13, phbm_hf14, phbm_hf15, phbm_hf16, 
               phbm_hf17, phbm_hf18, phbm_hf19, phbm_hf20, 
               phbm_hf21, phbm_hf22, phbm_hf23, phbm_hf24, 
               phbm_hf25, phbm_hf26, phbm_hf27, phbm_hf28,
               phbm_hf29, phbm_hf30, phbm_hf31, phbm_hf32,
               phbm_hf33, phbm_hf34)

结果输出为98369行乘20列。

phbm_hf

这正是我想要的。但是由于ihbm_hf中包含的行号并不总是34,所以我无法使用此代码,因此它起作用了,因为我知道在特定时刻ihbm_hf的长度为34。

####since the length can be longer or shorter than 34    
l <- length(ihbm_hf$gvkey)

for(i in 1:l){

  phbm_hf  <- price[which(ihbm_hf$gvkey[i] == price$gvkey),]

  }

此代码的问题在于,它仅将最后一个观测值(第34个观测值)保存在phbm_hf中,而且我不知道如何像在循环内的长代码中手动存储那样存储它们。

2 个答案:

答案 0 :(得分:0)

您可以这样做:

library(data.table)

# convert to data table format for speed
setDT(price)
setDT(ihbm_hf)

# filter price data
phbm_hf = price[GVKEY %in% ihbm_hf$GVKEY]

答案 1 :(得分:0)

有许多不同的方法可以做到这一点,但是我将指出您解决方案的快速解决方案。循环的问题是您在循环中每次迭代时都不断更新/覆盖相同的变量(phbn_hf)。您将需要分别存储结果。一种方法是在向量​​中,例如:

l <- length(ihbm_hf$gvkey)
phbn_hf <- rep(NA, l) # creates a vector of length l

for(i in 1:l){

   phbm_hf[i] <- price[which(ihbm_hf$gvkey[i] == price$gvkey),]

}