我正在尝试遍历每一行,并从列A到E计算值,并从WhoCol计算相应的列名。它可以工作,但是此步骤需要很长时间才能存储50,000行数据。有没有一种有效的方法可以做到这一点?
library(data.table)
df<-structure(list(Id = 1:10, A = c(73L, 61L, 46L, 26L, 18L, 29L,
88L, 18L, 56L, 81L), B = c(68L, 49L, 27L, 10L, 37L, 72L, 71L,
60L, 52L, 62L), C = c(98L, 59L, 76L, 46L, 46L, 31L, 77L, 83L,
51L, 6L), D = c(40L, 18L, 27L, 18L, 72L, 95L, 87L, 29L, 35L,
80L), E = c(74L, 87L, 27L, 98L, 54L, 91L, 100L, 71L, 13L, 15L
), WhichCol = c("A", "C", "E", "B", "A", "D", "A", "C", "E",
"B"), Value = c(73L, 59L, 27L, 10L, 18L, 95L, 88L, 83L, 13L,
62L)), .Names = c("Id", "A", "B", "C", "D", "E", "WhichCol",
"Value"), class = "data.frame")
setDT(df)
df[["Value"]]<-sapply(1:nrow(df), function(x){ df[x, get(WhichCol)] })
值列已添加到此处的示例数据中-但这就是我想要得到的。
答案 0 :(得分:1)
您可以使用以下事实,而不是遍历每一行:对于WhichCol
的每个值,您都知道想要哪个列。 (例如,对于每WhichCol == "A"
的第A
列)。
df[, ValueNew := get(unique(WhichCol)), by = WhichCol]
我做了一些速度测试:
n <- 1000
df <- rbindlist(rep(list(df), n))
# over unique WhichCol
system.time(df[, ValueNew := get(unique(WhichCol)), by = WhichCol])
user system elapsed
0.002 0.000 0.001
system.time(df[["Value2"]]<-sapply(1:nrow(df), function(x){ df[x, get(WhichCol)] }))
user system elapsed
5.445 0.021 5.472
我希望这会对您有所帮助。