在R data.table中,根据另一列的值乘以列名

时间:2014-03-29 14:54:43

标签: r data.table

我想将不同货币的某些价格转换为特定货币。 假设我有这个:

library(data.table)
set.seed(100)
DT <- data.table(day=1:10, price=runif(10), currency=c("aud","eur"), 
                 aud=runif(10) + 1, eur=runif(10) + 1.5)
DT
    day        price currency      aud      eur
 1:   1   0.30776611      aud 1.624996 2.035811
 2:   2   0.25767250      eur 1.882166 2.210804
 3:   3   0.55232243      aud 1.280354 2.038349
 4:   4   0.05638315      eur 1.398488 2.248972
 5:   5   0.46854928      aud 1.762551 1.920101
 6:   6   0.48377074      eur 1.669022 1.671420
 7:   7   0.81240262      aud 1.204612 2.270302
 8:   8   0.37032054      eur 1.357525 2.381954
 9:   9   0.54655860      aud 1.359475 2.049097
10:  10   0.17026205      eur 1.690291 1.777724

每天的价格以货币栏中显示的相应货币表示。所以第一天的0.30776611是澳元(澳元),欧元(欧元)是0.25767250。列audeur列以美元显示各货币的汇率。如何以data.table方式创建以美元表示的新价格列?

我需要使用基于price的相应列名称多个currency才能获得此内容:

DT
    day        price currency      aud      eur price.in.usd
 1:   1   0.30776611      aud 1.624996 2.035811    0.5001187
 2:   2   0.25767250      eur 1.882166 2.210804    0.5696634
 3:   3   0.55232243      aud 1.280354 2.038349    0.7071682
 4:   4   0.05638315      eur 1.398488 2.248972    0.1268041
 5:   5   0.46854928      aud 1.762551 1.920101    0.825842
 6:   6   0.48377074      eur 1.669022 1.671420    0.8085841
 7:   7   0.81240262      aud 1.204612 2.270302    0.9786299
 8:   8   0.37032054      eur 1.357525 2.381954    0.8820865
 9:   9   0.54655860      aud 1.359475 2.049097    0.7430328
10:  10   0.17026205      eur 1.690291 1.777724    0.3026789

因此,第一天我乘以price * aud = 0.30776611 * 1.624996,因为价格在aud列的currency,而第二天price * eur = 0.25767250 * 2.210804出于同样的原因。

真实数据包括大约40种货币,因此创建箭头反模式的多个ifelse()不是很方便。

目前,通过我的数据的子样本,我有这个:

DT.all[, price := ifelse(curcdd=="AUD", adj.price * AUD, 
                       ifelse(curcdd=="BEF", adj.price * BEF, 
                              ifelse(curcdd=="BGN", adj.price * BGN, 
                                     ifelse(curcdd=="CHF", adj.price * CHF, 
                                            ifelse(curcdd=="CZK", adj.price * CZK, 
                                                   ifelse(curcdd=="DEM", adj.price * DEM, 
                                                          ifelse(curcdd=="EUR", adj.price * EUR, 
                                                                 ifelse(curcdd=="FRF", adj.price * FRF, 
                                                                        ifelse(curcdd=="GBP", adj.price * GBP, 
                                                                               ifelse(curcdd=="ILS", adj.price * ILS, 
                                                                                      ifelse(curcdd=="JPY", adj.price * JPY, 
                                                                                             ifelse(curcdd=="NLG", adj.price * NLG, 
                                                                                                    ifelse(curcdd=="NOK", adj.price * NOK, 
                                                                                                           ifelse(curcdd=="PLN", adj.price * PLN, 
                                                                                                                  ifelse(curcdd=="SEK", adj.price * SEK,
                                                                                                                         ifelse(curcdd=="SGD", adj.price * SGD,
                                                                                                                                ifelse(curcdd=="USD", adj.price, NA)))))))))))))))))]

哪个有效,但它只有大约20种货币,所有这些货币(约40种)肯定不优雅......

非常感谢!

3 个答案:

答案 0 :(得分:4)

[编辑]使用get来提取我在Matthew Dowle的回答中看到的列名引用的值的想法似乎是有效的:

 setkey(DT, currency)
 DT[ , cvt :=  .SD[, get(currency)]*price, by=currency]
 DT

    day      price currency      aud      eur       cvt
 1:   1 0.30776611      aud 1.624996 2.035811 0.5001188
 2:   3 0.55232243      aud 1.280354 2.038349 0.7071681
 3:   5 0.46854928      aud 1.762551 1.920101 0.8258420
 4:   7 0.81240262      aud 1.204612 2.270302 0.9786301
 5:   9 0.54655860      aud 1.359475 2.049097 0.7430328
 6:   2 0.25767250      eur 1.882166 2.210804 0.5696634
 7:   4 0.05638315      eur 1.398488 2.248972 0.1268041
 8:   6 0.48377074      eur 1.669022 1.671420 0.8085842
 9:   8 0.37032054      eur 1.357525 2.381954 0.8820863
10:  10 0.17026205      eur 1.690291 1.777724 0.3026789

这是一种方法,虽然它并没有很好地推广到更多的货币:

DT[ , cvt := ifelse (currency == 'aud', price*aud, price*eur) ]
> DT
    day      price currency      aud      eur       cvt
 1:   1 0.30776611      aud 1.624996 2.035811 0.5001188
 2:   2 0.25767250      eur 1.882166 2.210804 0.5696634
 3:   3 0.55232243      aud 1.280354 2.038349 0.7071681
 4:   4 0.05638315      eur 1.398488 2.248972 0.1268041
 5:   5 0.46854928      aud 1.762551 1.920101 0.8258420
 6:   6 0.48377074      eur 1.669022 1.671420 0.8085842
 7:   7 0.81240262      aud 1.204612 2.270302 0.9786301
 8:   8 0.37032054      eur 1.357525 2.381954 0.8820863
 9:   9 0.54655860      aud 1.359475 2.049097 0.7430328
10:  10 0.17026205      eur 1.690291 1.777724 0.3026789

您收到警告(如果您尝试使用if(.){.}else{.},则会收到不同的结果:

DT[ , cvt := if (currency == 'aud'){price*aud}else{price*eur}]

这与data.frames完全类似。但是......在data.table中使用ifelse已经很慢了。

答案 1 :(得分:1)

在此解决方案中,您需要指定不同货币的数量(在本例中为2)和观察数量(在本例中为10),并且还假定货币值('aud','eur'等)是最后几列。

> B_msk <- matrix(rep(DT$currency,2), ncol=2, byrow=TRUE)==matrix(rep(colnames(DT)[-(1:3)], 10), ncol=2)
> DF <- data.frame(DT)
> DF$in_USD <- rowSums(DF[colnames(DT)[-(1:3)]]*B_msk*DF$price)
> DF #or data.table(DF)
   day      price currency      aud      eur    in_USD
1    1 0.30776611      aud 1.624996 2.035811 0.5001188
2    2 0.25767250      eur 1.882166 2.210804 0.5696634
3    3 0.55232243      aud 1.280354 2.038349 0.7071681
4    4 0.05638315      eur 1.398488 2.248972 0.1268041
5    5 0.46854928      aud 1.762551 1.920101 0.8258420
6    6 0.48377074      eur 1.669022 1.671420 0.8085842
7    7 0.81240262      aud 1.204612 2.270302 0.9786301
8    8 0.37032054      eur 1.357525 2.381954 0.8820863
9    9 0.54655860      aud 1.359475 2.049097 0.7430328
10  10 0.17026205      eur 1.690291 1.777724 0.3026789

编辑:

希望此解决方案解决内存问题,(但仍需要将数据放在data.frame

> Idx=cbind(1:10,match(DT[,currency], colnames(DT))) #replace 10 with the actually np. of obs.
> DF=data.frame(DT)
> DF
   day      price currency      aud      eur
1    1 0.30776611      aud 1.624996 2.035811
2    2 0.25767250      eur 1.882166 2.210804
3    3 0.55232243      aud 1.280354 2.038349
4    4 0.05638315      eur 1.398488 2.248972
5    5 0.46854928      aud 1.762551 1.920101
6    6 0.48377074      eur 1.669022 1.671420
7    7 0.81240262      aud 1.204612 2.270302
8    8 0.37032054      eur 1.357525 2.381954
9    9 0.54655860      aud 1.359475 2.049097
10  10 0.17026205      eur 1.690291 1.777724
> DF$price*as.numeric(DF[Idx]) #assign it as 'DF$P_in_USD'
 [1] 0.5001187 0.5696634 0.7071682 0.1268041 0.8258420 0.8085841 0.9786299 0.8820865 0.7430327 0.3026789

答案 2 :(得分:1)

您是否考虑过简单地循环货币,过滤主数据框只保留给定货币的价格,在子集数据框中执行转换,最后堆叠所有货币数据框(或逐步填充主数据框中的列)< / p>