我有一个数据框(称为Metheo)。一栏包含日期和下一个不同的参数,它们在20年中每天进行测量。 我只想创建一个新的数据框,其中包含所有年份(20年)中每个月的前十年,第二年和第三十年的平均值。
但是某些月份有31或30天,而Februar则有28或29天。怎么做?
Metheo[1:20,]
Date Tmax Tmin Tmean Rainfall Humidity Sunshine Cloud Wind SeeLevelPressure
1 1997-01-01 4.4 1.5 2.7 0.0 80 0.0 5.8 2.6 1030.5
2 1997-01-02 5.8 -1.7 0.9 0.0 79 0.3 1.4 2.4 1030.8
3 1997-01-03 4.0 -2.5 1.1 0.0 79 0.3 3.2 4.0 1027.8
4 1997-01-04 1.9 -4.5 -3.8 0.0 83 0.4 2.2 1.9 1025.8
5 1997-01-05 -3.0 -8.3 -6.8 0.0 84 0.5 2.0 2.5 1024.7
6 1997-01-06 -4.5 -9.0 -7.2 0.0 81 0.6 0.1 2.8 1022.1
7 1997-01-07 -5.2 -9.5 -7.3 0.0 83 0.6 1.8 2.8 1019.6
8 1997-01-08 1.4 -9.4 -3.1 0.0 84 0.0 4.2 4.4 1014.4
9 1997-01-09 1.5 -4.8 -3.8 0.1 85 0.0 7.8 4.0 1022.8
10 1997-01-10 -2.5 -7.5 -6.3 0.0 91 0.0 6.0 2.3 1018.6
11 1997-01-11 -3.5 -9.2 -5.6 NA 90 0.0 5.6 2.9 1006.6
12 1997-01-12 0.5 -4.4 -1.2 0.4 95 0.0 8.0 4.6 993.5
13 1997-01-13 -2.0 -3.8 -2.8 2.8 88 0.0 7.9 5.0 990.4
14 1997-01-14 -0.7 -4.5 -2.2 8.7 88 0.0 8.0 4.8 979.1
15 1997-01-15 -0.6 -7.0 -4.7 3.9 85 0.0 7.6 3.2 1004.2
16 1997-01-16 -1.7 -7.0 -2.5 1.9 91 0.0 8.0 3.9 1002.4
17 1997-01-17 -0.5 -3.0 -2.1 15.2 94 0.0 8.0 7.4 999.2
18 1997-01-18 -2.6 -10.8 -7.9 1.2 80 0.1 4.2 6.3 1013.1
19 1997-01-19 5.8 -13.0 1.6 NA 75 0.0 7.1 9.3 1006.3
20 1997-01-20 6.2 -2.1 2.4 0.2 79 0.0 7.9 6.8 994.0
>
4 1997-01-04 1.9 -4.5 -3.8 0.0 83 0.4 2.2 1.9 1025.8
5 1997-01-05 -3.0 -8.3 -6.8 0.0 84 0.5 2.0 2.5 1024.7
6 1997-01-06 -4.5 -9.0 -7.2 0.0 81 0.6 0.1 2.8 1022.1
7 1997-01-07 -5.2 -9.5 -7.3 0.0 83 0.6 1.8 2.8 1019.6
8 1997-01-08 1.4 -9.4 -3.1 0.0 84 0.0 4.2 4.4 1014.4
9 1997-01-09 1.5 -4.8 -3.8 0.1 85 0.0 7.8 4.0 1022.8
10 1997-01-10 -2.5 -7.5 -6.3 0.0 91 0.0 6.0 2.3 1018.6
11 1997-01-11 -3.5 -9.2 -5.6 NA 90 0.0 5.6 2.9 1006.6
12 1997-01-12 0.5 -4.4 -1.2 0.4 95 0.0 8.0 4.6 993.5
13 1997-01-13 -2.0 -3.8 -2.8 2.8 88 0.0 7.9 5.0 990.4
14 1997-01-14 -0.7 -4.5 -2.2 8.7 88 0.0 8.0 4.8 979.1
15 1997-01-15 -0.6 -7.0 -4.7 3.9 85 0.0 7.6 3.2 1004.2
16 1997-01-16 -1.7 -7.0 -2.5 1.9 91 0.0 8.0 3.9 1002.4
答案 0 :(得分:1)
您可以使用dplyr
和lubridate
软件包来做到这一点。但是首先,您需要计算十年:
Metheo %>% group_by(decade=floor(year(Date)/10)*10, month=month(Date)) %>%
summarise_at(vars(Tmax, Tmin, Tmean, Rainfall, Humidity, Sunshine, Cloud, Wind, SeeLevelPressure), funs(mean))
编辑看到有关仅在每个月的前10天进行计算的注释,我们可以简单地添加一个过滤器:
Metheo %>% filter(day(Date) <= 10) %>%
group_by(decade=floor(year(Date)/10)*10, month=month(Date)) %>%
summarise_at(vars(Tmax, Tmin, Tmean, Rainfall, Humidity, Sunshine, Cloud, Wind, SeeLevelPressure), funs(mean))
抱歉,请参见decaday
以连续运行10天。与十年相同,但是您在第31天做什么?让我们抛出另一个函数:
df %>% group_by(decade=floor(year(Date)/10)*10, month=month(Date), decaday=cut(day(Date), breaks=c(0,10,20))) %>%
summarise_at(vars(Tmax, Tmin, Tmean, Rainfall, Humidity, Sunshine, Cloud, Wind, SeeLevelPressure), funs(mean))
decade month decaday Tmax Tmin Tmean Rainfall Humidity Sunshine Cloud Wind SeeLevelPressure
<dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1990 1 (0,10] 0.38 -5.57 -3.36 0.01 82.9 0.27 3.45 2.97 1024.
2 1990 1 (10,20] 0.09 -6.48 -2.5 NA 86.5 0.01 7.23 5.42 999.
答案 1 :(得分:1)
这里是带有软件包dplyr
的解决方案。它还使用软件包zoo
,函数as.yearmon
和lubridate
函数day
。
library(dplyr)
Metheo$Date <- as.Date(Metheo$Date)
Metheo %>%
mutate(Month = zoo::as.yearmon(Date),
Tens = floor((lubridate::day(Date) - 1)/10)*10,
Tens = ifelse(Tens == 30, 20, Tens),
Month = paste(Month, Tens)) %>%
group_by(Month) %>%
summarise_at(vars(Tmax:SeeLevelPressure), mean, na.rm = TRUE)
## A tibble: 2 x 10
# Month Tmax Tmin Tmean Rainfall Humidity Sunshine Cloud Wind
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 jan … 0.38 -5.57 -3.36 0.01 82.9 0.27 3.45 2.97
#2 jan … 0.09 -6.48 -2.5 4.29 86.5 0.01 7.23 5.42
## ... with 1 more variable: SeeLevelPressure <dbl>
dput
格式的数据。
Metheo <-
structure(list(Date = structure(1:20, .Label = c("1997-01-01",
"1997-01-02", "1997-01-03", "1997-01-04", "1997-01-05", "1997-01-06",
"1997-01-07", "1997-01-08", "1997-01-09", "1997-01-10", "1997-01-11",
"1997-01-12", "1997-01-13", "1997-01-14", "1997-01-15", "1997-01-16",
"1997-01-17", "1997-01-18", "1997-01-19", "1997-01-20"), class = "factor"),
Tmax = c(4.4, 5.8, 4, 1.9, -3, -4.5, -5.2, 1.4, 1.5, -2.5,
-3.5, 0.5, -2, -0.7, -0.6, -1.7, -0.5, -2.6, 5.8, 6.2), Tmin = c(1.5,
-1.7, -2.5, -4.5, -8.3, -9, -9.5, -9.4, -4.8, -7.5, -9.2,
-4.4, -3.8, -4.5, -7, -7, -3, -10.8, -13, -2.1), Tmean = c(2.7,
0.9, 1.1, -3.8, -6.8, -7.2, -7.3, -3.1, -3.8, -6.3, -5.6,
-1.2, -2.8, -2.2, -4.7, -2.5, -2.1, -7.9, 1.6, 2.4), Rainfall = c(0,
0, 0, 0, 0, 0, 0, 0, 0.1, 0, NA, 0.4, 2.8, 8.7, 3.9, 1.9,
15.2, 1.2, NA, 0.2), Humidity = c(80L, 79L, 79L, 83L, 84L,
81L, 83L, 84L, 85L, 91L, 90L, 95L, 88L, 88L, 85L, 91L, 94L,
80L, 75L, 79L), Sunshine = c(0, 0.3, 0.3, 0.4, 0.5, 0.6,
0.6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.1, 0, 0), Cloud = c(5.8,
1.4, 3.2, 2.2, 2, 0.1, 1.8, 4.2, 7.8, 6, 5.6, 8, 7.9, 8,
7.6, 8, 8, 4.2, 7.1, 7.9), Wind = c(2.6, 2.4, 4, 1.9, 2.5,
2.8, 2.8, 4.4, 4, 2.3, 2.9, 4.6, 5, 4.8, 3.2, 3.9, 7.4, 6.3,
9.3, 6.8), SeeLevelPressure = c(1030.5, 1030.8, 1027.8, 1025.8,
1024.7, 1022.1, 1019.6, 1014.4, 1022.8, 1018.6, 1006.6, 993.5,
990.4, 979.1, 1004.2, 1002.4, 999.2, 1013.1, 1006.3, 994)),
class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6",
"7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20"))