Question

简要数据集说明：我有从Qualtrics生成的调查数据，我已将其导入R作为tibble。每列对应一个调查问题，我保留了原始列顺序（与调查中问题的顺序一致）。

用简单的语言表示问题：由于参与者的磨损正常，并非所有参与者都完成了调查中的所有问题。我想知道每个参与者在调查中得到了多少，以及他们在停止之前回答的最后一个问题。

R中的问题陈述：我想生成（使用tidyverse）：

1）新列（ lastq ），列出每行（即每个参与者）的最后一个非NA列的名称（即他们完成的最后一个问题的名称））。
2）第二个新列，列出 lastq

示例数据框df

df <- tibble(
  year = c(2015, 2015, 2016, 2016),
  grade = c(1, NA, 1, NA),
  height = c("short", "tall", NA, NA),
  gender = c(NA, "m", NA, "f")
 )

原创df

  # A tibble: 4 x 4
   year grade height gender
  <dbl> <dbl>  <chr>  <chr>
1  2015     1  short   <NA>
2  2015    NA   tall      m
3  2016     1   <NA>   <NA>
4  2016    NA   <NA>      f

所需的最终df

   # A tibble: 4 x 6
   year grade height gender  lastq lastqnum
  <dbl> <dbl>  <chr>  <chr>  <chr>    <dbl>
1  2015     1  short   <NA> height        3
2  2015    NA   tall      m gender        4
3  2016     1   <NA>   <NA>  grade        2
4  2016    NA   <NA>      f gender        4

还有其他一些相关的问题，但我似乎找不到任何重点是根据混合变量类的类型提取列名（与the values themselves相比）（vs. {{ 3}}），使用tidyverse解决方案

我一直在尝试 - 我知道我在这里找不到的东西......：

ds %>% map(which(!is.na(.)))
ds %>% map(tail(!is.na(.), 2))
ds %>% rowwise() %>% mutate(last = which(!is.na(ds)))

非常感谢你的帮助！

Answer 1

Write a function that solves the problem, following James' suggestion but a little more robust (handles the case when all answers are NA)

f0 = function(df) {
    idx = ifelse(is.na(df), 0L, col(df))
    apply(idx, 1, max)
}

The L makes the 0 an integer, rather than numeric. For a speed improvement (when there are many rows), use the matrixStats package

f1 = function(df) {
    idx = ifelse(is.na(df), 0L, col(df))
    matrixStats::rowMaxs(idx, na.rm=TRUE)
}

Follow markus' suggestion to use this in a dplyr context

mutate(df, lastqnum = f1(df), lastq = c(NA, names(df))[lastqnum + 1])
df %>% mutate(lastqnum = f1(.), lastq = c(NA, names(.))[lastqnum + 1])

or just do it

lastqnum = f1(df)
cbind(df, lastq=c(NA, names(df))[lastqnum + 1], lastqnum)

Edited after acceptance I guess the tidy approach would be first to tidy the data into long form

df1 = cbind(gather(df), id = as.vector(row(df)), event = as.vector(col(df)))

and then to group and summarize

group_by(df1, id) %>%
    summarize(lastq = tail(event[!is.na(value)], 1), lastqname = key[lastq])

This doesn't handle the case when here are no answers.

每行最后一个非NA行的列名;使用tidyverse解决方案？

1 个答案: