在数据表中遍历行并由get column方法代替的有效方法?

时间:2018-11-16 05:57:26

标签: r data.table

我正在尝试遍历每一行,并从列A到E计算值,并从WhoCol计算相应的列名。它可以工作,但是此步骤需要很长时间才能存储50,000行数据。有没有一种有效的方法可以做到这一点?

library(data.table)
df<-structure(list(Id = 1:10, A = c(73L, 61L, 46L, 26L, 18L, 29L, 
              88L, 18L, 56L, 81L), B = c(68L, 49L, 27L, 10L, 37L, 72L, 71L, 
              60L, 52L, 62L), C = c(98L, 59L, 76L, 46L, 46L, 31L, 77L, 83L, 
              51L, 6L), D = c(40L, 18L, 27L, 18L, 72L, 95L, 87L, 29L, 35L, 
              80L), E = c(74L, 87L, 27L, 98L, 54L, 91L, 100L, 71L, 13L, 15L
              ), WhichCol = c("A", "C", "E", "B", "A", "D", "A", "C", "E", 
              "B"), Value = c(73L, 59L, 27L, 10L, 18L, 95L, 88L, 83L, 13L, 
              62L)), .Names = c("Id", "A", "B", "C", "D", "E", "WhichCol", 
              "Value"), class = "data.frame")

setDT(df)
df[["Value"]]<-sapply(1:nrow(df), function(x){ df[x, get(WhichCol)] })

值列已添加到此处的示例数据中-但这就是我想要得到的。

1 个答案:

答案 0 :(得分:1)

您可以使用以下事实,而不是遍历每一行:对于WhichCol的每个值,您都知道想要哪个列。 (例如,对于每WhichCol == "A"的第A列)。

df[, ValueNew := get(unique(WhichCol)), by = WhichCol]

我做了一些速度测试:

 n <- 1000
 df <- rbindlist(rep(list(df), n))

 # over unique WhichCol

 system.time(df[, ValueNew := get(unique(WhichCol)), by = WhichCol])
    user  system elapsed 
   0.002   0.000   0.001 

 system.time(df[["Value2"]]<-sapply(1:nrow(df), function(x){ df[x, get(WhichCol)] }))
   user  system elapsed 
  5.445   0.021   5.472 

我希望这会对您有所帮助。