我有一个数据表,A
像这样:
year location sigma_NN_1 sigma_NN_2 sigma_NN_3
2076 43.59375_-116.78125 1.4681173 1.664289 1.735974
2077 43.59375_-116.78125 1.3798515 1.550524 1.551269
2078 43.59375_-116.78125 0.7934367 1.064248 1.177981
2079 43.59375_-116.78125 1.8235574 1.991018 2.288402
2080 43.59375_-116.78125 2.5560329 2.578093 2.589334
我想用它来掩盖另一个西格玛值低于阈值的数据表,假设2。假设我的第二个数据表是B
year location location_NN_1 location_NN_2 location_NN_3
2076 43.59375_-116.78125 41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625
2077 43.59375_-116.78125 43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625
2078 43.59375_-116.78125 41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875
2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125 41.71875_-74.15625
2080 43.59375_-116.78125 41.34375_-90.78125 41.96875_-86.21875 41.21875_-90.65625
因此,我想使用B[A<2]
之类的东西,但是显然这是行不通的,否则,我不会在这里。
有什么建议吗?
预期输出:
输出
year location location_NN_1 location_NN_2 location_NN_3
2076 43.59375_-116.78125 41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625
2077 43.59375_-116.78125 43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625
2078 43.59375_-116.78125 41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875
2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125 NA
2080 43.59375_-116.78125 NA NA NA
目标是找到数据表A
中对应的sigma小于2的位置。
答案 0 :(得分:4)
我们可以使用基数R子集来标识B
的适当单元格,并将其替换为NA
。此方法要求A
和B
中的列顺序相同。
我们可以在dfa
上使用简单的条件语句来查找sigma值不小于2的单元格。由于我们不希望将条件应用于year和condition列,因此将它们子集化在应用条件之前:
!(dfa[-c(1,2)] < 2)
sigma_NN_1 sigma_NN_2 sigma_NN_3
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE FALSE
[4,] FALSE FALSE TRUE
[5,] TRUE TRUE TRUE
这将返回一个逻辑矩阵,可用于子集B
来替换值。这里发生的是我们对子集B
进行了两次:首先,我们忽略year和location列以仅获取location列,然后使用更早的条件来选择匹配sigma值不小于2的行并插入NA
放入其中:
dfb[-c(1,2)][!(dfa[-c(1,2)] < 2)] <- NA
dfb
year location location_NN_1 location_NN_2 location_NN_3
1 2076 43.59375_-116.78125 41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625
2 2077 43.59375_-116.78125 43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625
3 2078 43.59375_-116.78125 41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875
4 2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125 <NA>
5 2080 43.59375_-116.78125 <NA> <NA> <NA>
答案 1 :(得分:3)
假设这些data.table
对象,并假设“ A”中“ sigma”列的行均应小于阈值2。
library(data.table)
nm1 <- grep("sigma", names(A), value = TRUE)
i1 <- setDT(A)[, Reduce(`&`, lapply(.SD, `<`, 2)), .SDcols = nm1]
setDT(B)[i1]
基于预期的输出
nm2 <- grep("sigma", names(A))
B[, (nm2) := Map(function(x, y) replace(x, y >= 2, NA_character_),
.SD, A[, nm2, with = FALSE]), .SDcols = nm2][]
# year location location_NN_1 location_NN_2 location_NN_3
#1: 2076 43.59375_-116.78125 41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625
#2: 2077 43.59375_-116.78125 43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625
#3: 2078 43.59375_-116.78125 41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875
#4: 2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125 <NA>
#5: 2080 43.59375_-116.78125 <NA> <NA> <NA>
A <- structure(list(year = 2076:2080, location = c("43.59375_-116.78125",
"43.59375_-116.78125", "43.59375_-116.78125", "43.59375_-116.78125",
"43.59375_-116.78125"), sigma_NN_1 = c(1.4681173, 1.3798515,
0.7934367, 1.8235574, 2.5560329), sigma_NN_2 = c(1.664289, 1.550524,
1.064248, 1.991018, 2.578093), sigma_NN_3 = c(1.735974, 1.551269,
1.177981, 2.288402, 2.589334)), class = "data.frame", row.names = c(NA,
-5L))
B <- structure(list(year = 2076:2080, location = c("43.59375_-116.78125",
"43.59375_-116.78125", "43.59375_-116.78125", "43.59375_-116.78125",
"43.59375_-116.78125"), location_NN_1 = c("41.15625_-90.65625",
"43.34375_-78.15625", "41.34375_-90.78125", "43.53125_-116.78125",
"41.34375_-90.78125"), location_NN_2 = c("41.21875_-90.65625",
"43.34375_-78.21875", "41.21875_-90.65625", "41.34375_-90.78125",
"41.96875_-86.21875"), location_NN_3 = c("41.15625_-90.65625",
"43.28125_-78.15625", "41.53125_-73.96875", "41.71875_-74.15625",
"41.21875_-90.65625")), class = "data.frame", row.names = c(NA,
-5L))
答案 2 :(得分:3)
简单的基础R解决方案:
B[-(1:2)][A[-(1:2)]>=2] <- NA
选择除第一和第二B[-(1:2)]
以外的所有列。
然后使用向量化逻辑表达式A[-(1:2)]>=2
将正确的元素设置为NA
。
结果:
year location location_NN_1 location_NN_2 location_NN_3
1 2076 43.59375_-116.78125 41.15625_-90.65625 41.21875_-90.65625 41.15625_-90.65625
2 2077 43.59375_-116.78125 43.34375_-78.15625 43.34375_-78.21875 43.28125_-78.15625
3 2078 43.59375_-116.78125 41.34375_-90.78125 41.21875_-90.65625 41.53125_-73.96875
4 2079 43.59375_-116.78125 43.53125_-116.78125 41.34375_-90.78125 <NA>
5 2080 43.59375_-116.78125 <NA> <NA> <NA>