请参阅组计算中的变量列表

时间:2017-04-13 19:17:06

标签: r variables group-by dplyr plyr

我想计算一个组中每个变量的最大值(它们共20个),我想知道有没有更简单的方法来执行计算,而不是使用直接列出所有使用{{1}的方法和summarise中的group_by和?{1}}下面列出的样本数据:

dplyr

Name Year test1 test2 test3 test4 test5 test6 test7 test8 test9 test10 test11 test12 test13 test14 test15 test16 test17 test18 test19 test20 John 2008 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 John 2008 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 John 2009 0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 John 2010 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 0 1 0 0 1 John 2010 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 1 1 John 2010 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 John 2011 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 John 2011 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 John 2012 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 John 2012 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 John 2012 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 John 2013 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 Mary 2009 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Mary 2010 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 Mary 2010 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 Mary 2011 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 Mary 2011 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 Mary 2011 0 0 1 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 Mary 2011 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 Mary 2012 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 Mary 2012 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 Mary 2013 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Mary 2013 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 Jack 2010 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 Jack 2010 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 Jack 2011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 Jack 2011 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 Jack 2011 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 Jack 2011 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 Jack 2012 0 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 Jack 2012 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 Jack 2013 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 Jack 2013 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 Jack 2014 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Jack 2015 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 0 1 0 0 1 Jack 2015 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 1 1 Jack 2015 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 test1代表不同类型的测试,test20代表此人参加此考试,1代表他/她不参加。一个人可以尽可能多地参加考试。我希望有一个0级聚合,证明这个人是否曾在那一年参加过每次考试。如上所述,如果有任何简单的方法可以计算所有20个测试的person-year级别max?我正在考虑使用person-year,但如果有更好的方法,仍然在努力。

提前致谢!

1 个答案:

答案 0 :(得分:2)

添加tidyr会有所帮助:

# highlighting your data above
dat <- read.table("clipboard", header = TRUE, stringsAsFactors = FALSE)

library(dplyr)
library(tidyr)

dat %>%
  gather(test, tookit, -Name, -Year) %>%
  group_by(Name, Year, test) %>%
  summarize(times = sum(tookit)) %>%
  ungroup()
# # A tibble: 340 × 4
#     Name  Year   test times
#    <chr> <int>  <chr> <int>
# 1   Jack  2010  test1     0
# 2   Jack  2010 test10     1
# 3   Jack  2010 test11     1
# 4   Jack  2010 test12     0
# 5   Jack  2010 test13     0
# 6   Jack  2010 test14     1
# 7   Jack  2010 test15     2
# 8   Jack  2010 test16     0
# 9   Jack  2010 test17     0
# 10  Jack  2010 test18     0
# # ... with 330 more rows

这告诉你他们每年参加每次考试的次数。

另一种方法(不含tidyr):

dat %>%
  group_by(Name, Year) %>%
  summarize_at(starts_with("test", vars=colnames(.)), sum) %>%
  ungroup()
# A tibble: 17 × 22
#     Name  Year test1 test2 test3 test4 test5 test6 test7 test8 test9 test10
#    <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>  <int>
# 1   Jack  2010     0     0     0     0     1     0     0     0     0      1
# 2   Jack  2011     0     1     0     0     1     1     0     0     1      1
# 3   Jack  2012     0     0     1     1     0     0     0     0     1      1
# 4   Jack  2013     1     0     0     0     0     1     0     0     0      0
# 5   Jack  2014     0     0     0     0     0     0     0     0     0      0
# 6   Jack  2015     0     0     0     1     0     1     1     1     1      0
# 7   John  2008     2     0     1     0     0     1     0     0     0      1
# 8   John  2009     0     1     1     0     0     0     1     0     1      0
# 9   John  2010     0     0     0     1     0     1     1     1     1      0
# 10  John  2011     0     0     0     1     2     0     1     1     0      1
# 11  John  2012     0     0     1     1     0     0     2     1     1      0
# 12  John  2013     0     0     1     0     0     0     0     0     0      0
# 13  Mary  2009     0     0     1     0     1     0     0     0     0      0
# 14  Mary  2010     0     0     0     0     1     0     1     0     0      1
# 15  Mary  2011     0     1     1     1     0     0     1     1     1      1
# 16  Mary  2012     0     0     0     0     1     1     0     1     0      1
# 17  Mary  2013     0     0     0     1     0     0     1     1     0      0
# # ... with 10 more variables: test11 <int>, test12 <int>, test13 <int>,
# #   test14 <int>, test15 <int>, test16 <int>, test17 <int>, test18 <int>,
# #   test19 <int>, test20 <int>