我有一个简单的问题,但我无法弄清楚如何在dplyr / tidyr中获得所需的结果。
我刚刚像这样计算了一个摘要数据框:
results <- df_long %>%
group_by(question,imputed_liberal, question_text) %>%
summarize(Accuracy = mean(score, na.rm = T), Reaction_Time = mean(reation_time, na.rm = T), Number = n())
每个问题在两行中重复,一行用于imputed_liberal = T,一行用于imputed_liberal = F,一列用于准确性和react_time。
question imputed_liberal question_text Accuracy Reaction_Time Number
1 10 F How many... 0.750 61.4 16
2 10 T How many... 0.429 55.9 14
我想将这两行折叠为一个单独的列(每个问题一行),并在列中输入“保守准确度”(估算的自由度= F),“自由准确度”,“保守反应时间”和“自由反应”时间。”
我认为spread
是正确的方法,但无法弄清楚如何在两个值(准确度和react_time)上进行扩散。
我的尝试:
results <- results %>%
filter(!is.na(Accuracy)) %>%
spread(results, key = imputed_liberal, value = c(Accuracy, Reaction_time))
引发错误,因为您不能同时传播两个值。
答案 0 :(得分:1)
一个选项是将您分为2部分,并将这2部分结合在一起。
library(dplyr)
inner_join(filter(results, imputed_liberal),
filter(results, !imputed_liberal), by="question") %>%
select(-Number.y)
结果:
注意:可以根据自己的选择重命名列。
# question imputed_liberal.x question_text.x Accuracy.x Reaction_Time.x Number.x imputed_liberal.y question_text.y Accuracy.y Reaction_Time.y
# 1 10 TRUE How many... 0.429 55.9 14 FALSE How many... 0.75 61.4
数据:
results <- read.table(text =
"question imputed_liberal question_text Accuracy Reaction_Time Number
1 10 FALSE 'How many...' 0.750 61.4 16
2 10 TRUE 'How many...' 0.429 55.9 14",
header = TRUE, stringsAsFactors = FALSE)
答案 1 :(得分:1)
这是标准的tidyr
方式:
library(tidyverse)
df %>%
select(-Number) %>%
mutate(imputed_liberal = ifelse(imputed_liberal,1,0)) %>%
gather(,,Accuracy, Reaction_Time) %>%
unite(key,key,imputed_liberal) %>%
spread(key,value)
# question question_text Accuracy_0 Accuracy_1 Reaction_Time_0 Reaction_Time_1
# 1 10 How many... 0.75 0.429 61.4 55.9
您也可以先嵌套,这样就可以减少体操次数:
df %>%
select(-Number) %>%
nest(Accuracy, Reaction_Time) %>%
spread(imputed_liberal,data) %>%
unnest(.sep = "_")
# question question_text FALSE_Accuracy FALSE_Reaction_Time TRUE_Accuracy TRUE_Reaction_Time
# 1 10 How many... 0.75 61.4 0.429 55.9