使用Common ID列在data.frame之间传输值的最快方法?

时间:2014-12-01 18:04:09

标签: r dataframe

我有两个data.frame,我想在“d”data.frame中将“custMeanPrice”的值与“att.customer”data.frame中的相应值分配为“customerID”

> head(d)
  customerID custMeanPrice
1        794             0
2        794             0
3        794             0
4        808             0
5        825             0
6        825             0
> dim(d)
[1] 428,165      2

> head(att.customer)
  customerID meanPrice
1        794  68.91000
2        808  39.90000
3        825  79.34444
4        850  76.18571
5        860  93.72353
6        873  69.90000
> dim(att.customer)
[1] 49,870     2

我尝试使用以下来处理它,但它很慢,因为尺寸很大,并且无法看到结束。

for(i in 1:nrow(att.customer)){
  k = which(d$customerID == att.customer$customerID[i])
  d$custMeanPrice[k] = att.customer$meanPrice[i]
}

以快速和智能的方式执行此操作的最佳方法是什么?

1 个答案:

答案 0 :(得分:3)

您可以尝试data.table

library(data.table)
setDT(d)
setkey(setDT(att.customer), customerID)
att.customer[d][,custMeanPrice:=NULL][]
#   customerID meanPrice
#1:        794  68.91000
#2:        794  68.91000
#3:        794  68.91000
#4:        808  39.90000
#5:        825  79.34444
#6:        825  79.34444

来自@David Arenburg的评论,上述内容也可以在不将att.customer转换为data.table的情况下完成

setkey(setDT(d), customerID)[, custMeanPrice := as.numeric(custMeanPrice)]
d[att.customer, custMeanPrice := meanPrice][]
#    customerID custMeanPrice
#1:        794      68.91000
#2:        794      68.91000
#3:        794      68.91000
#4:        808      39.90000
#5:        825      79.34444
#6:        825      79.34444

或者custMeanPrice已经numeric

 setkey(setDT(d), customerID)
 d[att.customer, custMeanPrice := meanPrice][]

或者您可以使用match

中的base R
d$custMeanPrice <- att.customer$meanPrice[match(d$customerID,
             att.customer$customerID)]

 d
 #  customerID custMeanPrice
#1        794      68.91000
#2        794      68.91000
#3        794      68.91000
#4        808      39.90000
#5        825      79.34444
#6        825      79.34444

数据

d <-  structure(list(customerID = c(794L, 794L, 794L, 808L, 825L, 825L
), custMeanPrice = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("customerID", 
"custMeanPrice"), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6"))

att.customer <-   structure(list(customerID = c(794L, 808L, 825L, 850L, 860L, 873L
), meanPrice = c(68.91, 39.9, 79.34444, 76.18571, 93.72353, 69.9
 )), .Names = c("customerID", "meanPrice"), class = "data.frame", row.names = 
c("1", "2", "3", "4", "5", "6"))