以下是生成样本数据集的代码:
require(data.table)
testdata <- data.table(
X = rep(sample(1:3),5),
Y = rep(sample(1:3),5),
X1 = rnorm(15),
X2 = rnorm(15),
X3 = rnorm(15),
Y1 = NA_character_,
Y2 = NA_character_,
Y3 = NA_character_
)
初始数据表:
X Y X1 X2 X3 Y1 Y2 Y3
1: 3 3 -0.7098927 0.63342935 0.94470612 NA NA NA
2: 1 2 0.3008547 -1.40043977 1.53781754 NA NA NA
3: 2 1 0.3423140 0.34897695 -0.38402565 NA NA NA
4: 3 3 -0.5726456 -2.24526957 -1.10947867 NA NA NA
5: 1 2 -1.3239474 -0.53924617 -0.04103982 NA NA NA
6: 2 1 0.2493801 0.85806647 0.96488021 NA NA NA
7: 3 3 -2.0653505 0.05481703 1.75161043 NA NA NA
8: 1 2 -1.3919774 0.34282832 0.50834289 NA NA NA
9: 2 1 0.5928025 -1.11899399 0.35967102 NA NA NA
10: 3 3 -0.4704720 0.64004313 -0.17343794 NA NA NA
11: 1 2 0.3056093 2.14544631 0.43740447 NA NA NA
12: 2 1 -0.1568971 1.05091249 1.18884487 NA NA NA
13: 3 3 -1.3078670 1.07482123 -0.65367957 NA NA NA
14: 1 2 0.4622123 -0.60308532 -1.11104235 NA NA NA
15: 2 1 -0.7894978 0.33018926 -0.04700393 NA NA NA
以下是我要执行的操作: 在每一行中,
if X = 2 and Y = 3 then Y3 <- X2
预期产出:
X Y X1 X2 X3 Y1 Y2 Y3
1: 3 3 -0.7098927 0.63342935 0.94470612 NA NA 0.94470612
2: 1 2 0.3008547 -1.40043977 1.53781754 NA 0.3008547 NA
3: 2 1 0.3423140 0.34897695 -0.38402565 0.34897695 NA NA
4: 3 3 -0.5726456 -2.24526957 -1.10947867 NA NA -1.10947867
5: 1 2 -1.3239474 -0.53924617 -0.04103982 NA -1.3239474 NA
6: 2 1 0.2493801 0.85806647 0.96488021 0.85806647 NA NA
7: 3 3 -2.0653505 0.05481703 1.75161043 NA NA 1.75161043
8: 1 2 -1.3919774 0.34282832 0.50834289 NA -1.3919774 NA
9: 2 1 0.5928025 -1.11899399 0.35967102 -1.11899399 NA NA
10: 3 3 -0.4704720 0.64004313 -0.17343794 NA NA -0.17343794
11: 1 2 0.3056093 2.14544631 0.43740447 NA 0.3056093 NA
12: 2 1 -0.1568971 1.05091249 1.18884487 1.05091249 NA NA
13: 3 3 -1.3078670 1.07482123 -0.65367957 NA NA -0.65367957
14: 1 2 0.4622123 -0.60308532 -1.11104235 NA 0.4622123 NA
15: 2 1 -0.7894978 0.33018926 -0.04700393 0.33018926 NA NA
如何使用简单的data.table语法实现此目的?我试过get,eval(解析)等,但每次都遇到麻烦。
请注意,我的实际数据集非常大(100多列),因此我需要一个不依赖于列号的解决方案。我也可以编写大量的if语句,但对于需要以类似方式分配的大约30个奇数列来说,这似乎是一种不好的方法。
data.table版本为1.10.4,R版本为3.3.2
编辑:我用一个函数解决了它。不确定这是否是最佳方式,因为它非常慢。
populateY <- function(input_table) {
for(i in 1:nrow(input_table)) {
k <- X
j <- Y
tempX <- paste0("input_table$X",k,"[i]")
tempY <- paste0("input_table$Y",j,"[i]")
eval(parse(text = paste0(tempY," <- ",tempX)))
}
return(input_table)
}
答案 0 :(得分:0)
如果您愿意使用tidyverse和tibble数据框,我会这样做。
require(tibble)
testdata <- as_tibble(testdata)
testdata <- testdata %>%
mutate(Y3 = ifelse(X == 2 & Y == 3, X2, NA))
然后,您可以在mutate函数中轻松,清晰地添加所需的所有行。
否则,如果你肯定要使用data.tables,那么我会考虑akrun的建议,虽然你需要将Y3列的数据类型更改为double,或者只是在运行时不存在它那段代码。