这是我的玩具数据。我有val和四分位数变量q0到q4。
df <- tibble::tribble(
~val, ~q0, ~q1, ~q2, ~q3, ~q4, ~q, ~diff,
15L, 15L, 15L, 15L, 15, 15L, 4L, 0,
17L, 2L, 16L, 30L, 34, 54L, 2L, 13,
29L, 2L, 16L, 30L, 34, 54L, 2L, 1,
25L, 2L, 17L, 20L, 26, 43L, 3L, 1 )
我需要计算最后两个变量,以便:
如何最好使用tidyverse计算R中的q和diff?也许我们可以在这里利用答案:Extract column name and specific value based on a condition。
答案 0 :(得分:2)
当您具有这样更复杂的逻辑时,我发现通常最好将其包装在函数中。将来将更易于维护,读取和调试。当使用很多嵌套的ifelse语句或大的case_when类型的东西时,我也要格外小心。在接受的答案中,q
只能为2、3或4。q
没有提供为1的情况,您肯定希望在最终产品中将其作为选项。
df <- tibble::tribble(
~val, ~q0, ~q1, ~q2, ~q3, ~q4, ~q, ~diff,
15L, 15L, 15L, 15L, 15, 15L, 4L, 0,
17L, 2L, 16L, 30L, 34, 54L, 2L, 13,
29L, 2L, 16L, 30L, 34, 54L, 2L, 1,
25L, 2L, 17L, 20L, 26, 43L, 3L, 1 )
whichQ <- function(df, qs = c('q0', 'q1', 'q2', 'q3', 'q4')) {
# This has the flexibility of changing your column names / using more or less Q splits
qDf <- df[, qs]
# This finds the right quantile by finding how many you are larger than
# It works because the q's are sequential
whichGreater <- df$val >= qDf
q <- apply(whichGreater, 1, sum)
# 4 is a special case because there is no next quantile
q <- ifelse(q == 5, 4, q)
df$q <- q
# Go through the Qs we found and grab the value of that column
diff <- sapply(seq_along(q), function(x) {
as.integer(qDf[x, q[x]+1])
})
# Get the difference
df$diff <- diff - df$val
df
}
您仍然可以在tidyverse管道中使用它,但是(我认为)只要您将函数命名为有用的东西,就会更清楚了。
df %>%
whichQ %>%
head(2)
答案 1 :(得分:1)
尝试:
library(tidyverse)
df <- tribble(
~val, ~q0, ~q1, ~q2, ~q3, ~q4,
15L, 15L, 15L, 15L, 15, 15L,
17L, 2L, 16L, 30L, 34, 54L,
29L, 2L, 16L, 30L, 34, 54L,
25L, 2L, 17L, 20L, 26, 43L)
df %>%
mutate(q = ifelse(val > q1 & val < q2, 2,
ifelse(val == q0 & val == q1 & val == q2 & val == q3 & val == q4, 4,
3)),
diff = ifelse(val > q1 & val < q2, q2 - val,
ifelse(val == q0 & val == q1 & val == q2 & val == q3 & val == q4, q4 - val,
q3 - val)))
# A tibble: 4 x 8
val q0 q1 q2 q3 q4 q diff
<int> <int> <int> <int> <dbl> <int> <dbl> <dbl>
1 15 15 15 15 15 15 4 0
2 17 2 16 30 34 54 2 13
3 29 2 16 30 34 54 2 1
4 25 2 17 20 26 43 3 1
使用case_when
(假设val
在q2
和q3
之间时,您选择3)。
df %>%
mutate(q = case_when(val > q1 & val < q2 ~ 2,
val == q0 & val == q1 & val == q2 & val == q3 & val == q4 ~ 4,
val > q2 & val < q3 ~ 3),
diff = case_when(val > q1 & val < q2 ~ q2 - val,
val == q0 & val == q1 & val == q2 & val == q3 & val == q4 ~ q4 - val,
val > q2 & val < q3 ~ as.integer(q3 - val)))
# A tibble: 4 x 8
val q0 q1 q2 q3 q4 q diff
<int> <int> <int> <int> <dbl> <int> <dbl> <int>
1 15 15 15 15 15 15 4 0
2 17 2 16 30 34 54 2 13
3 29 2 16 30 34 54 2 1
4 25 2 17 20 26 43 3 1