我有一个将珊瑚招募到实验单位或模块的数据库。在我的一次普查期间,我不得不在模块114的北(N)侧在不同的日期开始并完成新兵普查。我需要使用最近的观察日期作为日期来汇总这些实例的新兵人数。对于第1行和第2行,我希望合并后的行的日期为2017-08-20。
我需要使用完整的函数来填充隐性丢失的数据,其中未观察到新兵。但是,这造成了一个问题,因为在进行分析时,数据框包含多行(观察),我需要1行。
n3 <- structure(list(`Module #` = structure(c(4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("111", "112",
"113", "114", "115", "116", "211", "212", "213", "214", "215",
"216"), class = "factor"), Side = structure(c(1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), .Label = c("N",
"S", "T"), class = "factor"), TimeStep = c(4L, 4L, 5L, 6L, 7L,
4L, 4L, 5L, 6L, 7L, 4L, 4L, 5L, 6L, 7L), Date = structure(c(17389,
17398, 17482, 17601, NA, 17389, 17404, NA, 17601, 17682, 17389,
17404, NA, 17601, NA), class = "Date"), Year = structure(c(1L,
1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L), .Label = c("17",
"18"), class = "factor"), Site = structure(c(2L, 2L, 2L, 2L,
NA, 2L, 2L, NA, 2L, 2L, 2L, 2L, NA, 2L, NA), .Label = c("HAN",
"WAI"), class = "factor"), Treatment = c("CLO", "CLO", "CLO",
"CLO", NA, "CLO", "CLO", NA, "CLO", "CLO", "CLO", "CLO", NA,
"CLO", NA), recruits = c(5, 1, 2, 1, 0, 4, 1, 0, 2, 4, 1, 1,
0, 1, 0), Site_long = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Hanauma Bay", "Waikiki"
), class = "factor"), Shelter = structure(c(2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("High", "Low"
), class = "factor")), row.names = c(NA, -15L), class = "data.frame")
通过合并第1-2、6-7和11-12行,我的输出应为12行。感谢您的输入!
答案 0 :(得分:1)
获取最大Date
值和sum
每个组中的新兵,并从每个组中仅选择1行。
library(dplyr)
n3 %>%
group_by(`Module #`, Side, TimeStep) %>%
mutate(Date = max(Date, na.rm = TRUE),
recruits = sum(recruits)) %>%
slice(1)
# `Module #` Side TimeStep Date Year Site Treatment recruits Site_long Shelter
# <fct> <fct> <int> <date> <fct> <fct> <chr> <dbl> <fct> <fct>
# 1 114 N 4 2017-08-20 17 WAI CLO 6 Waikiki Low
# 2 114 N 5 2017-11-12 17 WAI CLO 2 Waikiki Low
# 3 114 N 6 2018-03-11 18 WAI CLO 1 Waikiki Low
# 4 114 N 7 NA 18 NA NA 0 Waikiki Low
# 5 114 S 4 2017-08-26 17 WAI CLO 5 Waikiki Low
# 6 114 S 5 NA 17 NA NA 0 Waikiki Low
# 7 114 S 6 2018-03-11 18 WAI CLO 2 Waikiki Low
# 8 114 S 7 2018-05-31 18 WAI CLO 4 Waikiki Low
# 9 114 T 4 2017-08-26 17 WAI CLO 2 Waikiki Low
#10 114 T 5 NA 17 NA NA 0 Waikiki Low
#11 114 T 6 2018-03-11 18 WAI CLO 1 Waikiki Low
#12 114 T 7 NA 18 NA NA 0 Waikiki Low
答案 1 :(得分:1)
我们可以使用data.table
方法。将'data.frame'转换为'data.table'(setDT(n3)
),按'Module#','Side','TimeStep'分组,获得{Date'和{{ 1}}中的“新兵”,更新这些列并通过这些分组变量获得max
行
sum
或者使用unique
,我们在前4列中按library(data.table)
unique(setDT(n3)[, c("Date", "recruits") := list(max(Date,
na.rm = TRUE), sum(recruits)), .(`Module #`, Side, TimeStep)],
by = c("Module #", "Side", "TimeStep"))
# Module # Side TimeStep Date Year Site Treatment recruits Site_long Shelter
# 1: 114 N 4 2017-08-20 17 WAI CLO 24 Waikiki Low
# 2: 114 N 5 2017-11-12 17 WAI CLO 2 Waikiki Low
# 3: 114 N 6 2018-03-11 18 WAI CLO 1 Waikiki Low
# 4: 114 N 7 <NA> 18 <NA> <NA> 0 Waikiki Low
# 5: 114 S 4 2017-08-26 17 WAI CLO 20 Waikiki Low
# 6: 114 S 5 <NA> 17 <NA> <NA> 0 Waikiki Low
# 7: 114 S 6 2018-03-11 18 WAI CLO 2 Waikiki Low
# 8: 114 S 7 2018-05-31 18 WAI CLO 4 Waikiki Low
# 9: 114 T 4 2017-08-26 17 WAI CLO 8 Waikiki Low
#10: 114 T 5 <NA> 17 <NA> <NA> 0 Waikiki Low
#11: 114 T 6 2018-03-11 18 WAI CLO 1 Waikiki Low
#12: 114 T 7 <NA> 18 <NA> <NA> 0 Waikiki Low
,按“模块#”,“边”,“时间步长”,tidyverse
分组以得到{{1 }}的“招聘人员”和arrange
的最后一行
mutate