将虚拟变量列分为具有摘要统计信息的两列

时间:2018-06-23 18:30:30

标签: r dplyr tidyr

我有一个简单的问题,但我无法弄清楚如何在dplyr / tidyr中获得所需的结果。

我刚刚像这样计算了一个摘要数据框:

results <- df_long %>%
  group_by(question,imputed_liberal, question_text) %>% 
  summarize(Accuracy = mean(score, na.rm = T), Reaction_Time = mean(reation_time, na.rm = T), Number = n()) 

每个问题在两行中重复,一行用于imputed_liberal = T,一行用于imputed_liberal = F,一列用于准确性和react_time。

   question imputed_liberal question_text Accuracy Reaction_Time Number                                                         

 1 10       F               How many...    0.750       61.4     16
 2 10       T               How many...    0.429       55.9     14

我想将这两行折叠为一个单独的列(每个问题一行),并在列中输入“保守准确度”(估算的自由度= F),“自由准确度”,“保守反应时间”和“自由反应”时间。”

我认为spread是正确的方法,但无法弄清楚如何在两个值(准确度和react_time)上进行扩散。

我的尝试:

results <- results %>% 
           filter(!is.na(Accuracy)) %>%
           spread(results, key = imputed_liberal, value = c(Accuracy, Reaction_time))

引发错误,因为您不能同时传播两个值。

2 个答案:

答案 0 :(得分:1)

一个选项是将您分为2部分,并将这2部分结合在一起。

library(dplyr)

inner_join(filter(results, imputed_liberal), 
    filter(results, !imputed_liberal), by="question") %>%
     select(-Number.y)

结果:

注意:可以根据自己的选择重命名列。

# question imputed_liberal.x question_text.x Accuracy.x Reaction_Time.x Number.x imputed_liberal.y question_text.y Accuracy.y Reaction_Time.y
# 1       10              TRUE     How many...      0.429            55.9       14             FALSE     How many...       0.75            61.4

数据:

results <- read.table(text =
"question imputed_liberal question_text Accuracy Reaction_Time Number  
1 10       FALSE               'How many...'    0.750       61.4     16
2 10       TRUE               'How many...'    0.429       55.9     14",
header = TRUE, stringsAsFactors = FALSE)

答案 1 :(得分:1)

这是标准的tidyr方式:

library(tidyverse)
df %>%
  select(-Number) %>%
  mutate(imputed_liberal = ifelse(imputed_liberal,1,0)) %>%
  gather(,,Accuracy, Reaction_Time) %>%
  unite(key,key,imputed_liberal) %>%
  spread(key,value)

#   question question_text Accuracy_0 Accuracy_1 Reaction_Time_0 Reaction_Time_1
# 1       10   How many...       0.75      0.429            61.4            55.9

您也可以先嵌套,这样就可以减少体操次数:

df %>%
  select(-Number) %>%
  nest(Accuracy, Reaction_Time) %>%
  spread(imputed_liberal,data) %>%
  unnest(.sep = "_")

#   question question_text FALSE_Accuracy FALSE_Reaction_Time TRUE_Accuracy TRUE_Reaction_Time
# 1       10   How many...           0.75                61.4         0.429               55.9