在查看了其他一些常见问题并阅读了一些指南之后,我无法找到适合我的特定问题的解决方案。这是要开始的数据的示例:
data <- data.frame(
Date = sample(c("1993-07-05", "1993-07-05", "1993-07-05", "1993-08-30", "1993-08-30", "1993-08-30", "1993-08-30", "1993-09-04", "1993-09-04")),
Site = sample(c("1", "1", "1", "1", "1", "1", "1", "1", "1")),
Station = sample(c("1", "2", "3", "1", "2", "3", "4", "1", "2")),
Oxygen = sample(c("0.9", "0.4", "4.2", "5.6", "7.3", "4.3", "9.5", "5.3", "0.3")))
我想对嵌套在与日期相对应的站点内的站点的所有氧气值求平均值。我的数据集有两千行,并且像示例中一样,站点数量不均匀,日期长度也不均匀。
我要查找的输出是“日期->站点->平均氧气”之类的列,但在新版本的时间序列中完全不需要站列。
任何帮助将不胜感激!
答案 0 :(得分:1)
按“站点”,“日期”分组后,获得“氧气”的mean
(将其转换为numeric
-在factor
列之后)
library(tidyverse)
data %>%
group_by(Site, Date) %>%
summarise(AverageOxygen = mean(as.numeric(as.character(Oxygen))))
# A tibble: 3 x 3
# Groups: Site [1]
# Site Date AverageOxygen
# <fct> <fct> <dbl>
#1 1 1993-07-05 3.97
#2 1 1993-08-30 5.2
#3 1 1993-09-04 2.55
答案 1 :(得分:1)
尝试:
library(hablar)
library(tidyverse)
data %>%
retype() %>%
group_by(Site, Date) %>%
summarize(AverageOxygen = mean(Oxygen))
为您提供:
# A tibble: 3 x 3
# Groups: Site [?]
Site Date AverageOxygen
<int> <date> <dbl>
1 1 1993-07-05 4.7
2 1 1993-08-30 3.55
3 1 1993-09-04 4.75