跨数据框乘以列值

时间:2018-07-05 19:39:26

标签: r dataframe dplyr substring mapply

我希望将数据跨两个不同的数据帧进行乘积。输入是动态的,因此每个团队的ID数量不是恒定的。我在下面提供了一组示例数据。

Branch  ID-1  Time-1  ID-2  Time-2  ID-3  Time-3
Texas   BKP   5.5     LMG   2.8     DDP   8.9
Maine   BQQ   11      BKP   8.1     OLW   3.0
NYork   DDP   2.0     ADD   6.5     BQQ   0.4

有数百个分支可以利用数千个ID,因此下面只是我创建的一小部分。我们还会收到所有ID的费用数据框,如下所示:

ID     Cost
ADD    4.50
BKP    11.99
BQQ    1.50
DDP    8.99
LMG    24.99
OLW    29.99

我试图按每个分支的ID来查找成本,并且能够按子字符串进行查询,但是加入第二个数据帧是我遇到麻烦的地方。我需要的输出如下所示:

Branch  ID-1  Time-1  ID-2  Time-2  ID-3  Time-3  Cost-1  Cost-2  Cost-3
Texas   BKP   5.5     LMG   2.8     DDP   8.9     65.945  69.972  80.011
Maine   BQQ   11      BKP   8.1     OLW   3.0     16.50   97.119  89.97
NYork   DDP   2.0     ADD   6.5     BQQ   0.4     17.98   29.25   0.6

我知道这不是最漂亮的输出,但是不幸的是这是需要的输出。非常感谢您能为此提供的帮助。

2 个答案:

答案 0 :(得分:2)

使用的可能解决方案:

library(data.table)

melt(setDT(df1), id = 1,
     measure.vars = patterns(c("^ID","^Time")),
     value.name = c("ID","Time")
     )[df2, on = .(ID), Cost := Time * i.Cost
       ][, dcast(.SD, Branch ~ variable, value.var = c("ID","Time","Cost"))]

给出:

   Branch ID_1 ID_2 ID_3 Time_1 Time_2 Time_3 Cost_1 Cost_2 Cost_3
1:  Maine  BQQ  BKP  OLW   11.0    8.1    3.0 16.500 97.119 89.970
2:  NYork  DDP  ADD  BQQ    2.0    6.5    0.4 17.980 29.250  0.600
3:  Texas  BKP  LMG  DDP    5.5    2.8    8.9 65.945 69.972 80.011

使用的数据:

df1 <- structure(list(Branch = c("Texas", "Maine", "NYork"), ID.1 = c("BKP", "BQQ", "DDP"), Time.1 = c(5.5, 11, 2), 
                      ID.2 = c("LMG", "BKP", "ADD"), Time.2 = c(2.8, 8.1, 6.5), ID.3 = c("DDP", "OLW", "BQQ"), Time.3 = c(8.9, 3, 0.4)), 
                 .Names = c("Branch", "ID.1", "Time.1", "ID.2", "Time.2", "ID.3", "Time.3"), class = "data.frame", row.names = c(NA, -3L))
df2 <- structure(list(ID = c("ADD", "BKP", "BQQ", "DDP", "LMG", "OLW"), Cost = c(4.5, 11.99, 1.5, 8.99, 24.99, 29.99)),
                 .Names = c("ID", "Cost"), class = "data.frame", row.names = c(NA, -6L))

答案 1 :(得分:0)

使用基本R:

dat3 = merge(reshape(dat1,matrix(2:ncol(dat1),2),idv=1,dir="long",ids=dat1$Branch),dat2,by.x="ID.1",by.y="ID")

 reshape(transform(dat3,Cost=Cost * Time.1)[order(dat3$time),],idvar = "Branch",dir="wide")

  Branch ID.1.1 Time.1.1 Cost.1 ID.1.2 Time.1.2 Cost.2 ID.1.3 Time.1.3 Cost.3
2  Texas    BKP      5.5 65.945    LMG      2.8 69.972    DDP      8.9 80.011
4  Maine    BQQ     11.0 16.500    BKP      8.1 97.119    OLW      3.0 89.970
6  NYork    DDP      2.0 17.980    ADD      6.5 29.250    BQQ      0.4  0.600