R如何在不同日期对同一实验单位的两次测量求和

时间:2019-06-28 03:58:43

标签: r dataframe dplyr

我有一个将珊瑚招募到实验单位或模块的数据库。在我的一次普查期间,我不得不在模块114的北(N)侧在不同的日期开始并完成新兵普查。我需要使用最近的观察日期作为日期来汇总这些实例的新兵人数。对于第1行和第2行,我希望合并后的行的日期为2017-08-20。

我需要使用完整的函数来填充隐性丢失的数据,其中未观察到新兵。但是,这造成了一个问题,因为在进行分析时,数据框包含多行(观察),我需要1行。

n3 <- structure(list(`Module #` = structure(c(4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("111", "112", 
"113", "114", "115", "116", "211", "212", "213", "214", "215", 
"216"), class = "factor"), Side = structure(c(1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("N", 
"S", "T"), class = "factor"), TimeStep = c(4L, 4L, 5L, 6L, 7L, 
4L, 4L, 5L, 6L, 7L, 4L, 4L, 5L, 6L, 7L), Date = structure(c(17389, 
17398, 17482, 17601, NA, 17389, 17404, NA, 17601, 17682, 17389, 
17404, NA, 17601, NA), class = "Date"), Year = structure(c(1L, 
1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L), .Label = c("17", 
"18"), class = "factor"), Site = structure(c(2L, 2L, 2L, 2L, 
NA, 2L, 2L, NA, 2L, 2L, 2L, 2L, NA, 2L, NA), .Label = c("HAN", 
"WAI"), class = "factor"), Treatment = c("CLO", "CLO", "CLO", 
"CLO", NA, "CLO", "CLO", NA, "CLO", "CLO", "CLO", "CLO", NA, 
"CLO", NA), recruits = c(5, 1, 2, 1, 0, 4, 1, 0, 2, 4, 1, 1, 
0, 1, 0), Site_long = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Hanauma Bay", "Waikiki"
), class = "factor"), Shelter = structure(c(2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("High", "Low"
), class = "factor")), row.names = c(NA, -15L), class = "data.frame")

通过合并第1-2、6-7和11-12行,我的输出应为12行。感谢您的输入!

2 个答案:

答案 0 :(得分:1)

获取最大Date值和sum每个组中的新兵,并从每个组中仅选择1行。

library(dplyr)

n3 %>%
  group_by(`Module #`, Side, TimeStep) %>%
  mutate(Date = max(Date, na.rm = TRUE), 
         recruits = sum(recruits)) %>%
  slice(1)

# `Module #` Side  TimeStep Date       Year  Site  Treatment recruits Site_long Shelter
#   <fct>      <fct>    <int> <date>     <fct> <fct> <chr>        <dbl> <fct>     <fct>  
# 1 114        N            4 2017-08-20 17    WAI   CLO              6 Waikiki   Low    
# 2 114        N            5 2017-11-12 17    WAI   CLO              2 Waikiki   Low    
# 3 114        N            6 2018-03-11 18    WAI   CLO              1 Waikiki   Low    
# 4 114        N            7 NA         18    NA    NA               0 Waikiki   Low    
# 5 114        S            4 2017-08-26 17    WAI   CLO              5 Waikiki   Low    
# 6 114        S            5 NA         17    NA    NA               0 Waikiki   Low    
# 7 114        S            6 2018-03-11 18    WAI   CLO              2 Waikiki   Low    
# 8 114        S            7 2018-05-31 18    WAI   CLO              4 Waikiki   Low    
# 9 114        T            4 2017-08-26 17    WAI   CLO              2 Waikiki   Low    
#10 114        T            5 NA         17    NA    NA               0 Waikiki   Low    
#11 114        T            6 2018-03-11 18    WAI   CLO              1 Waikiki   Low    
#12 114        T            7 NA         18    NA    NA               0 Waikiki   Low    

答案 1 :(得分:1)

我们可以使用data.table方法。将'data.frame'转换为'data.table'(setDT(n3)),按'Module#','Side','TimeStep'分组,获得{Date'和{{ 1}}中的“新兵”,更新这些列并通过这些分组变量获得max

sum

或者使用unique,我们在前4列中按library(data.table) unique(setDT(n3)[, c("Date", "recruits") := list(max(Date, na.rm = TRUE), sum(recruits)), .(`Module #`, Side, TimeStep)], by = c("Module #", "Side", "TimeStep")) # Module # Side TimeStep Date Year Site Treatment recruits Site_long Shelter # 1: 114 N 4 2017-08-20 17 WAI CLO 24 Waikiki Low # 2: 114 N 5 2017-11-12 17 WAI CLO 2 Waikiki Low # 3: 114 N 6 2018-03-11 18 WAI CLO 1 Waikiki Low # 4: 114 N 7 <NA> 18 <NA> <NA> 0 Waikiki Low # 5: 114 S 4 2017-08-26 17 WAI CLO 20 Waikiki Low # 6: 114 S 5 <NA> 17 <NA> <NA> 0 Waikiki Low # 7: 114 S 6 2018-03-11 18 WAI CLO 2 Waikiki Low # 8: 114 S 7 2018-05-31 18 WAI CLO 4 Waikiki Low # 9: 114 T 4 2017-08-26 17 WAI CLO 8 Waikiki Low #10: 114 T 5 <NA> 17 <NA> <NA> 0 Waikiki Low #11: 114 T 6 2018-03-11 18 WAI CLO 1 Waikiki Low #12: 114 T 7 <NA> 18 <NA> <NA> 0 Waikiki Low ,按“模块#”,“边”,“时间步长”,tidyverse分组以得到{{1 }}的“招聘人员”和arrange的最后一行

mutate