在尝试按月汇总数据时,我在处理R中的这一数据管理步骤时遇到了困难。
我有两个例子,一个基本上代表我现在在清理和聚合过程中遇到的问题,第二个代表我希望它看起来像什么。
现在的样子:
month <- c("January", "January", "February", "March", "April", "April",
"May", "June", "July")
year <- c(2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014)
count1 <- c(3, 0, 1, 2, 0, 8, 1, 1, 1)
count2 <- c(0, 2, 1, 4, 6, 0, 1, 1, 1)
count3 <- c(1, 1, 1, 1, 1, 1, 0, 0, 1)
df <- data.frame(month, year, count1, count2, count3)
我希望它看起来像:
month2 <- c("January", "February", "March", "April", "May", "June", "July")
year2 <- c(2014, 2014, 2014, 2014, 2014, 2014, 2014)
count1a <- c(3, 1, 2, 8, 1, 1, 1)
count2a <- c(2, 1, 4, 6, 1, 1, 1)
count3a <- c(1, 1, 1, 1, 0, 0, 1)
df2 <- data.frame(month2, year2, count1a, count2a, count3a)
正如您将注意到的那样,我有几个月被计算两次,他们的观察与我在同一个月使用的其他观察结果不同。
答案 0 :(得分:1)
按“月”和“年”分组后,获取max
library(dplyr)
df %>%
group_by(month, year) %>%
summarise_all(max)
# A tibble: 7 x 5
# Groups: month [?]
# month year count1 count2 count3
# <fctr> <dbl> <dbl> <dbl> <dbl>
#1 April 2014 8 6 1
#2 February 2014 1 1 1
#3 January 2014 3 2 1
#4 July 2014 1 1 1
#5 June 2014 1 1 0
#6 March 2014 2 4 1
#7 May 2014 1 1 0
如果我们需要保持相同的顺序,那么
df %>%
group_by(month = factor(month, levels = unique(month)), year) %>%
summarise_all(max)
#or
#summarise_all(funs(.[order(-.)][1]))