R data.table中的相关列

时间:2015-12-06 21:08:49

标签: r data.table

为了跟踪现金流量,我在data.table中有许多相互关联的列:

  • “Amount_spent”始终是“余额”的5%。
  • “收入”为“Amount_spent”*“Price”
  • “余额”是“收入”(从100.00开始)的累积总和。
  • 交易仅发生在“Day”“a”

我正在努力同时计算这些相互关联的列。 我想要的例子:

library(data.table)
Day <- c( "a", "c", "b", "a", "b", "a", "b", "c", "a", "a" )
Price <- c( 0.6, 0.4, 0.9, -0.3, 0.8, 0.2, 0.3, 0.9, 0.9, -0.7 )
Balance <- c( 100.00, 103.00, 103.00, 103.00, 101.46, 101.46, 102.47, 102.47, 102.47, 107.08 )
Amount_spent <- c( 5.00, 0.00, 0.00, 5.15, 0.00, 5.07, 0.00, 0.00, 5.12, 5.35 )
Revenue <- c( 3.00, 0.00, 0.00, -1.55, 0.00, 1.01, 0.00, 0.00, 4.61, -3.75 )

DT <- data.table( Day, Price, Balance, Amount_spent, Revenue )
DT

到目前为止,这是我的尝试:

# set initial balance
Balance2 <- 100.00
Day2 <- c( "a", "c", "b", "a", "b", "a", "b", "c", "a", "a" )
Price2 <- c( 0.6, 0.4, 0.9, -0.3, 0.8, 0.2, 0.3, 0.9, 0.9, -0.7 )
my.try <- data.table( Day2, Price2 )
my.try[, Balance2 := cumsum( Revenue2 )]
my.try[ Day2 == "a", Amount_spent2 := Balance2 * 0.05 ]
my.try[is.na(Amount_spent2), Amount_spent2 := 0]
my.try[, Revenue2 := Price2 * Amount_spent2 ]
my.try

正如您将看到它失败并显示此错误消息Error in eval(expr, envir, enclos) : object 'Revenue2' not found,因为尚未创建“Revenue2”列。

谢谢

1 个答案:

答案 0 :(得分:0)

my.try[, Balance2 := cumsum( Revenue2 )]行尝试使用Revenue2中代码中DT不存在的列library(data.table) Day <- c( "a", "c", "b", "a", "b", "a", "b", "c", "a", "a" ) Price <- c( 0.6, 0.4, 0.9, -0.3, 0.8, 0.2, 0.3, 0.9, 0.9, -0.7 ) Balance <- c( 100.00, 103.00, 103.00, 103.00, 101.46, 101.46, 102.47, 102.47, 102.47, 107.08 ) Amount_spent <- c( 5.00, 0.00, 0.00, 5.15, 0.00, 5.07, 0.00, 0.00, 5.12, 5.35 ) Revenue <- c( 3.00, 0.00, 0.00, -1.55, 0.00, 1.01, 0.00, 0.00, 4.61, -3.75 ) DT <- data.table( Day, Price, Balance, Amount_spent, Revenue ) Balance2 <- 100.00 Day2 <- c( "a", "c", "b", "a", "b", "a", "b", "c", "a", "a" ) Price2 <- c( 0.6, 0.4, 0.9, -0.3, 0.8, 0.2, 0.3, 0.9, 0.9, -0.7 ) my.try <- data.table( Day2, Price2 ) my.try[, Balance2 := cumsum( Revenue2 )] #Error in eval(expr, envir, enclos) : object 'Revenue2' not found "Revenue2" %in% names(DT) #[1] FALSE 之后,您会收到提及的错误。

`:=()`

您没有产生预期的结果。我不确定同时计算列是什么意思。如果您想在一个步骤中通过引用分配/更新多个列,则可以使用.()函数,就像在data.table {{1}中使用list()j一样参数。例如:`:=`(col1=1+2, col2=2+3)
您可以在Reference semantics vignette中详细了解按引用更新