有没有一种更干净的方法可以在R中以多种方式对多个变量进行分组和汇总?

时间:2020-04-27 01:24:52

标签: r tidyverse survey

这是我的第一篇文章。很抱歉,如果我给东西喝醉了。

我有员工意见调查数据,其中包含5点李克特量表数据以及部门数据(以及其他人口统计数据)。我想得到%不利(1或2个调查答复),%中性(1个调查答复== 3)和%有利(4或5个答复)。我还希望每个部门都有这些%s。我得到的结果与下面的示例数据有关,但实际上有30多个变量。我希望有一种更清洁的方法来做到这一点!

这是我的示例数据:

survey <- data.frame(department = c('hr', 'hr', 'tech', 'tech', 'tech', 'hr', 'hr', 'tech', 'tech', 'tech'),
                  pride = c(1, 5, 2, 3, NA, 5, 5, 2, 3, NA),
                  satisfaction = c(5, 2, 3, NA, 5, 5, 2, 3, NA, 3),
                  leadership = c(5, 2, 3, NA, 5, 1, 1, 5, 2, 3))

使用此方法,我很容易获得%的好评:

items <- c('pride', 'satisfaction', 'leadership')
output <- survey %>% 
  group_by(department) %>% 
  mutate_at(items, recode, `1` = 0, `2` = 0, `3` = 0, `4` = 1, `5` = 1) %>%
  summarize_at(items, mean, na.rm = T) %>%
  rowwise() %>%
  mutate(engagement = mean(c(pride,satisfaction,leadership), na.rm = T)) %>%
  filter(!is.na(department))

一旦我尝试进行所有3个计算(%unfav,%neutral和%fav),它就会变得混乱。有没有比这更好的方法(它确实给了我想要的输出-考虑到我实际上有30多个变量,它也不是很可扩展):

items_fav <- c('pride_fav', 'satisfaction_fav', 'leadership_fav')
items_neutral <- c('pride_neut', 'satisfaction_neut', 'leadership_neut')
items_unfav <- c('pride_unfav', 'satisfaction_unfav', 'leadership_unfav')
all_items <- (c('pride_fav', 'satisfaction_fav', 'leadership_fav','pride_neut', 'satisfaction_neut', 'leadership_neut','pride_unfav', 'satisfaction_unfav', 'leadership_unfav'))
output_3parts <- survey %>%
  mutate(pride_fav = pride, 
         satisfaction_fav = satisfaction,
         leadership_fav = leadership, 
         pride_neut = pride, 
         satisfaction_neut = satisfaction,
         leadership_neut = leadership,
         pride_unfav = pride, 
         satisfaction_unfav = satisfaction,
         leadership_unfav = leadership) %>%
  mutate_at(items_fav, recode, `1` = 0, `2` = 0, `3` = 0, `4` = 1, `5` = 1) %>%
  mutate_at(items_neutral, recode, `1` = 0, `2` = 0, `3` = 1, `4` = 0, `5` = 0) %>%
  mutate_at(items_unfav, recode, `1` = 1, `2` = 1, `3` = 0, `4` = 0, `5` = 0) %>%
  group_by(department) %>%
  summarize_at(all_items, mean , na.rm = T)

输出看起来像这样:

第1行:部门pride_fav满意度_fav领导_fav pride_neut满意度_neut领导_neut pride_unfav满意度_unfav领导_unfav

第2行:小时0.75 0.5 0.25 0 0 0 0.25 0.5 0.75

第3行:技术0 0.25 0.4 0.5 0.75 0.4 0.5 0 0.2

谢谢!

1 个答案:

答案 0 :(得分:0)

如果我对您的理解正确,这可能会满足您的需求。

library(tidyverse)
)
survey %>%
  pivot_longer(cols = -department, names_to = "quality", values_to = "ranking") %>%
  group_by(department, quality) %>%
  summarise(mean_score = mean(ranking, na.rm = T)) %>%
  pivot_wider(names_from = quality, values_from = mean_score)