Question

对于包含多列的data.frame，我计划计算每列超出[低，高]范围的数据百分比。由于这些“低”和“高”值在列之间变化（在下面的代码段中计算为pcts），如何在使用summarise_each（）函数时传递每列的相应“低”和“高”值？到目前为止，我只能输入样本中显示的固定值。

pct10 <- function(dbl){quantile(dbl, 0.1)}
pct90 <- function(dbl){quantile(dbl, 0.9)}

valid.fms <- headgaze %>%
          filter(tracking_status == "OK")

pcts <- valid.fms %>%
     summarise_each(funs(pct10, pct90),
             head_pitch, head_yaw, head_roll,
             gaze_x, gaze_y, gaze_z)

 extreme.rt <- function(dbl, low, high){
               length(dbl[dbl < low | dbl > high])/length(dbl)

  }

feats <- valid.fms %>%
      group_by(lab_session) %>%
      summarise_each(funs(extreme.rt(., -10.98332, 11.045)),
                head_pitch, head_yaw, head_roll)

Answer 1

我认为不存在通用解决方案（因为您需要将列表相似的对象传递给summarise_each）。
但对于您的情况，一些变化可以帮助。超出范围的第一个标记，然后计算它。您可以使用mutate_each：

来实现此目的

is_beyond <- function(x) x < pct10(x) | x > pct90(x)
headgaze %>%
    filter(tracking_status == "OK") %>%
    mutate_each(
        funs(is_beyond)
        ,head_pitch, head_yaw, head_roll, gaze_x, gaze_y, gaze_z
    ) %>%
    group_by(lab_session) %>% # ! this comes *after* mutate
    summarise_each(funs(mean), head_pitch, head_yaw, head_roll, gaze_x, gaze_y, gaze_z)

使用R dplyr的summarise_each（）函数时，如何根据列传递参数

1 个答案: