假设我们有以下数据框:
df <- data.frame(X1 = 1:5, X2 = 6:10, X3 = c(6, 2, 3, 0, 2))
X1 X2 X3
1 1 6 6
2 2 7 2
3 3 8 3
4 4 9 0
5 5 10 2
我想添加一个由逻辑值组成的新列(X4
)。对于每一行:如果X3
等于X1
或X2
,则X4
应为TRUE
,否则为FALSE
。
我试过了:
mutate(df, X4 = X3 %in% c(X2, X1))
X1 X2 X3 X4
1 1 6 6 TRUE # OK
2 2 7 2 TRUE # OK
3 3 8 3 TRUE # OK
4 4 9 0 FALSE # OK
5 5 10 2 TRUE # expected to be FALSE
最重要的是,我的真实df
非常大,所以我想避免使用for循环。我会特权最短(代码量少)和最快的解决方案。
答案 0 :(得分:2)
你可以做这个矢量化,这是最快的:
do {
if let json = try JSONSerialization.jsonObject(with:data!, options: []) as? JSONDictionary {
print(json["StudentName"] as! String)
if let days = json["Days"] as? [JSONDictionary] {
for day in days {
print(day["DayName"] as! String)
if let lessons = day["Lessons"] as? [JSONDictionary] {
for lesson in lessons {
let classRoom = lesson["Classroom"] as! String
let name = lesson["Name"] as! String
let teacher = lesson["Teacher"] as! String
print(classRoom, name, teacher)
}
}
}
}
}
} catch {
print(error)
}
<强>基准强>
df$X4 <- with(df, X3==X1 | X3==X2)
答案 1 :(得分:1)
我们可以使用Reduce
Reduce(`|`, lapply(df[1:2], `==`, df[,3]))
#[1] TRUE TRUE TRUE FALSE FALSE
更大的数据更有意义
library(microbenchmark)
set.seed(24)
df <- data.frame(X1= sample(1:5, 1e6, replace=TRUE), X2 = sample(1:10, 1e6, replace=TRUE),
X3 = sample(1:10, 1e6, replace=TRUE))
f2 <- function(df) Reduce(`|`, lapply(df[1:2], `==`, df[,3]))
f3 <- function(df) with(df, X3==X1 | X3==X2)
microbenchmark(f1(df), f2(df), f3(df))
#Unit: milliseconds
# expr min lq mean median uq max neval
# f2(df) 8.191218 10.83333 23.28081 16.42744 22.26866 143.025 100
# f3(df) 8.154506 10.58878 19.17879 11.49179 22.41255 144.510 100
我认为apply
速度较慢,但Reduce
并不慢..
答案 2 :(得分:1)
使用dplyr的解决方案。
library(dplyr)
df %>%
rowwise() %>%
mutate(X4 = any(c(X1, X2) %in% X3)) %>%
ungroup()
# # A tibble: 5 x 4
# X1 X2 X3 X4
# <int> <int> <dbl> <lgl>
# 1 1 6 6.00 T
# 2 2 7 2.00 T
# 3 3 8 3.00 T
# 4 4 9 0 F
# 5 5 10 2.00 F