我想计算一个组中每个变量的最大值(它们共20个),我想知道有没有更简单的方法来执行计算,而不是使用直接列出所有使用{{1}的方法和summarise
中的group_by
和?{1}}下面列出的样本数据:
dplyr
Name Year test1 test2 test3 test4 test5 test6 test7 test8 test9 test10 test11 test12 test13 test14 test15 test16 test17 test18 test19 test20
John 2008 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0
John 2008 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0
John 2009 0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0
John 2010 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 0 1 0 0 1
John 2010 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 1 1
John 2010 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
John 2011 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0
John 2011 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
John 2012 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0
John 2012 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0
John 2012 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1
John 2013 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0
Mary 2009 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Mary 2010 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
Mary 2010 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
Mary 2011 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1
Mary 2011 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0
Mary 2011 0 0 1 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0
Mary 2011 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
Mary 2012 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0
Mary 2012 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0
Mary 2013 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
Mary 2013 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
Jack 2010 0 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0
Jack 2010 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0
Jack 2011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
Jack 2011 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0
Jack 2011 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0
Jack 2011 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0
Jack 2012 0 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0
Jack 2012 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
Jack 2013 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0
Jack 2013 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
Jack 2014 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Jack 2015 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 0 1 0 0 1
Jack 2015 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 1 1
Jack 2015 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
到test1
代表不同类型的测试,test20
代表此人参加此考试,1
代表他/她不参加。一个人可以尽可能多地参加考试。我希望有一个0
级聚合,证明这个人是否曾在那一年参加过每次考试。如上所述,如果有任何简单的方法可以计算所有20个测试的person-year
级别max
?我正在考虑使用person-year
,但如果有更好的方法,仍然在努力。
提前致谢!
安
答案 0 :(得分:2)
添加tidyr
会有所帮助:
# highlighting your data above
dat <- read.table("clipboard", header = TRUE, stringsAsFactors = FALSE)
library(dplyr)
library(tidyr)
dat %>%
gather(test, tookit, -Name, -Year) %>%
group_by(Name, Year, test) %>%
summarize(times = sum(tookit)) %>%
ungroup()
# # A tibble: 340 × 4
# Name Year test times
# <chr> <int> <chr> <int>
# 1 Jack 2010 test1 0
# 2 Jack 2010 test10 1
# 3 Jack 2010 test11 1
# 4 Jack 2010 test12 0
# 5 Jack 2010 test13 0
# 6 Jack 2010 test14 1
# 7 Jack 2010 test15 2
# 8 Jack 2010 test16 0
# 9 Jack 2010 test17 0
# 10 Jack 2010 test18 0
# # ... with 330 more rows
这告诉你他们每年参加每次考试的次数。
另一种方法(不含tidyr
):
dat %>%
group_by(Name, Year) %>%
summarize_at(starts_with("test", vars=colnames(.)), sum) %>%
ungroup()
# A tibble: 17 × 22
# Name Year test1 test2 test3 test4 test5 test6 test7 test8 test9 test10
# <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 Jack 2010 0 0 0 0 1 0 0 0 0 1
# 2 Jack 2011 0 1 0 0 1 1 0 0 1 1
# 3 Jack 2012 0 0 1 1 0 0 0 0 1 1
# 4 Jack 2013 1 0 0 0 0 1 0 0 0 0
# 5 Jack 2014 0 0 0 0 0 0 0 0 0 0
# 6 Jack 2015 0 0 0 1 0 1 1 1 1 0
# 7 John 2008 2 0 1 0 0 1 0 0 0 1
# 8 John 2009 0 1 1 0 0 0 1 0 1 0
# 9 John 2010 0 0 0 1 0 1 1 1 1 0
# 10 John 2011 0 0 0 1 2 0 1 1 0 1
# 11 John 2012 0 0 1 1 0 0 2 1 1 0
# 12 John 2013 0 0 1 0 0 0 0 0 0 0
# 13 Mary 2009 0 0 1 0 1 0 0 0 0 0
# 14 Mary 2010 0 0 0 0 1 0 1 0 0 1
# 15 Mary 2011 0 1 1 1 0 0 1 1 1 1
# 16 Mary 2012 0 0 0 0 1 1 0 1 0 1
# 17 Mary 2013 0 0 0 1 0 0 1 1 0 0
# # ... with 10 more variables: test11 <int>, test12 <int>, test13 <int>,
# # test14 <int>, test15 <int>, test16 <int>, test17 <int>, test18 <int>,
# # test19 <int>, test20 <int>