Question

我有一个817.741个交易和12个变量的杂货购物数据集，如下所示：

        Date Customer_ID Age_Group Address Product_Subclass   Product_ID Quantity Asset Price Price_Per_Unit Profit_Per_Item Budget_Item
1: 2000-11-01 00:00:00       46855        D       E            110411 4.710085e+12        3    51    57             19               6       FALSE
2: 2000-11-01 00:00:00      539166        E       E            130315 4.714981e+12        2    56    48             24              -8        TRUE
3: 2000-11-01 00:00:00      663373        F       E            110217 4.710266e+12        1   180   135            135             -45        TRUE

我已经初始化了变量＆＃34; Budget Item＆＃34;由：

Total_Input[,"Budget_Item"] <- FALSE

现在我希望预算项目为＆＃34; True＆＃34;以防万一（价格 - 资产＆lt; 0）。我是通过for循环完成的，但运行时间很长..有什么建议如何更多时间和内存效率？

for-loop的当前代码：

for(i in 1:nrow(Total_Input)){
  if(Total_Input$Price[i] - Total_Input$Asset[i] <0){Total_Input$Budget_Item[i] = TRUE}
}

Answer 1

由于这是一个data.table，我们可以做

library(data.table)
Total_Input[, Budget_Item := (Price - Asset) < 0]

如果“价格”，“资产”中缺少值，那么我们也可以创建条件

Total_Input[, Budget_Item := ((Price - Asset) < 0 ) & !is.na(Price - Asset)]

此外，我们无需将“Budget_Item”初始化为FALSE。它可以通过获取列的差异（'价格 - 资产'）直接创建，将其转换为logical (＆lt; 0 ) vector and assign (：=`）来创建列

加速for循环data.table

1 个答案: