在R

时间:2019-01-15 11:49:20

标签: r average

我有一个数据框(称为Metheo)。一栏包含日期和下一个不同的参数,它们在20年中每天进行测量。 我只想创建一个新的数据框,其中包含所有年份(20年)中每个月的前十年,第二年和第三十年的平均值。

但是某些月份有31或30天,而Februar则有28或29天。怎么做?

Metheo[1:20,]
         Date Tmax  Tmin Tmean Rainfall Humidity Sunshine Cloud Wind SeeLevelPressure
1  1997-01-01  4.4   1.5   2.7      0.0       80      0.0   5.8  2.6           1030.5
2  1997-01-02  5.8  -1.7   0.9      0.0       79      0.3   1.4  2.4           1030.8
3  1997-01-03  4.0  -2.5   1.1      0.0       79      0.3   3.2  4.0           1027.8
4  1997-01-04  1.9  -4.5  -3.8      0.0       83      0.4   2.2  1.9           1025.8
5  1997-01-05 -3.0  -8.3  -6.8      0.0       84      0.5   2.0  2.5           1024.7
6  1997-01-06 -4.5  -9.0  -7.2      0.0       81      0.6   0.1  2.8           1022.1
7  1997-01-07 -5.2  -9.5  -7.3      0.0       83      0.6   1.8  2.8           1019.6
8  1997-01-08  1.4  -9.4  -3.1      0.0       84      0.0   4.2  4.4           1014.4
9  1997-01-09  1.5  -4.8  -3.8      0.1       85      0.0   7.8  4.0           1022.8
10 1997-01-10 -2.5  -7.5  -6.3      0.0       91      0.0   6.0  2.3           1018.6
11 1997-01-11 -3.5  -9.2  -5.6       NA       90      0.0   5.6  2.9           1006.6
12 1997-01-12  0.5  -4.4  -1.2      0.4       95      0.0   8.0  4.6            993.5
13 1997-01-13 -2.0  -3.8  -2.8      2.8       88      0.0   7.9  5.0            990.4
14 1997-01-14 -0.7  -4.5  -2.2      8.7       88      0.0   8.0  4.8            979.1
15 1997-01-15 -0.6  -7.0  -4.7      3.9       85      0.0   7.6  3.2           1004.2
16 1997-01-16 -1.7  -7.0  -2.5      1.9       91      0.0   8.0  3.9           1002.4
17 1997-01-17 -0.5  -3.0  -2.1     15.2       94      0.0   8.0  7.4            999.2
18 1997-01-18 -2.6 -10.8  -7.9      1.2       80      0.1   4.2  6.3           1013.1
19 1997-01-19  5.8 -13.0   1.6       NA       75      0.0   7.1  9.3           1006.3
20 1997-01-20  6.2  -2.1   2.4      0.2       79      0.0   7.9  6.8            994.0

> 

4   1997-01-04  1.9  -4.5  -3.8      0.0       83      0.4   2.2  1.9           1025.8
5   1997-01-05 -3.0  -8.3  -6.8      0.0       84      0.5   2.0  2.5           1024.7
6   1997-01-06 -4.5  -9.0  -7.2      0.0       81      0.6   0.1  2.8           1022.1
7   1997-01-07 -5.2  -9.5  -7.3      0.0       83      0.6   1.8  2.8           1019.6
8   1997-01-08  1.4  -9.4  -3.1      0.0       84      0.0   4.2  4.4           1014.4
9   1997-01-09  1.5  -4.8  -3.8      0.1       85      0.0   7.8  4.0           1022.8
10  1997-01-10 -2.5  -7.5  -6.3      0.0       91      0.0   6.0  2.3           1018.6
11  1997-01-11 -3.5  -9.2  -5.6       NA       90      0.0   5.6  2.9           1006.6
12  1997-01-12  0.5  -4.4  -1.2      0.4       95      0.0   8.0  4.6            993.5
13  1997-01-13 -2.0  -3.8  -2.8      2.8       88      0.0   7.9  5.0            990.4
14  1997-01-14 -0.7  -4.5  -2.2      8.7       88      0.0   8.0  4.8            979.1
15  1997-01-15 -0.6  -7.0  -4.7      3.9       85      0.0   7.6  3.2           1004.2
16  1997-01-16 -1.7  -7.0  -2.5      1.9       91      0.0   8.0  3.9           1002.4

2 个答案:

答案 0 :(得分:1)

您可以使用dplyrlubridate软件包来做到这一点。但是首先,您需要计算十年:

Metheo %>% group_by(decade=floor(year(Date)/10)*10, month=month(Date)) %>% 
  summarise_at(vars(Tmax, Tmin, Tmean, Rainfall, Humidity, Sunshine, Cloud, Wind, SeeLevelPressure), funs(mean))

编辑看到有关仅在每个月的前10天进行计算的注释,我们可以简单地添加一个过滤器:

Metheo %>% filter(day(Date) <= 10) %>%
  group_by(decade=floor(year(Date)/10)*10, month=month(Date)) %>% 
  summarise_at(vars(Tmax, Tmin, Tmean, Rainfall, Humidity, Sunshine, Cloud, Wind, SeeLevelPressure), funs(mean))

抱歉,请参见decaday以连续运行10天。与十年相同,但是您在第31天做什么?让我们抛出另一个函数:

df %>% group_by(decade=floor(year(Date)/10)*10, month=month(Date), decaday=cut(day(Date), breaks=c(0,10,20))) %>% 
  summarise_at(vars(Tmax, Tmin, Tmean, Rainfall, Humidity, Sunshine, Cloud, Wind, SeeLevelPressure), funs(mean))

  decade month decaday  Tmax  Tmin Tmean Rainfall Humidity Sunshine Cloud  Wind SeeLevelPressure
   <dbl> <dbl> <fct>   <dbl> <dbl> <dbl>    <dbl>    <dbl>    <dbl> <dbl> <dbl>            <dbl>
1   1990     1 (0,10]   0.38 -5.57 -3.36     0.01     82.9     0.27  3.45  2.97            1024.
2   1990     1 (10,20]  0.09 -6.48 -2.5     NA        86.5     0.01  7.23  5.42             999.

答案 1 :(得分:1)

这里是带有软件包dplyr的解决方案。它还使用软件包zoo,函数as.yearmonlubridate函数day

library(dplyr)

Metheo$Date <- as.Date(Metheo$Date)

Metheo %>%
  mutate(Month = zoo::as.yearmon(Date),
         Tens = floor((lubridate::day(Date) - 1)/10)*10,
         Tens = ifelse(Tens == 30, 20, Tens),
         Month = paste(Month, Tens)) %>%
  group_by(Month) %>%
  summarise_at(vars(Tmax:SeeLevelPressure), mean, na.rm = TRUE)
## A tibble: 2 x 10
#  Month  Tmax  Tmin Tmean Rainfall Humidity Sunshine Cloud  Wind
#  <chr> <dbl> <dbl> <dbl>    <dbl>    <dbl>    <dbl> <dbl> <dbl>
#1 jan …  0.38 -5.57 -3.36     0.01     82.9     0.27  3.45  2.97
#2 jan …  0.09 -6.48 -2.5      4.29     86.5     0.01  7.23  5.42
## ... with 1 more variable: SeeLevelPressure <dbl>

dput格式的数据。

Metheo <-
structure(list(Date = structure(1:20, .Label = c("1997-01-01", 
"1997-01-02", "1997-01-03", "1997-01-04", "1997-01-05", "1997-01-06", 
"1997-01-07", "1997-01-08", "1997-01-09", "1997-01-10", "1997-01-11", 
"1997-01-12", "1997-01-13", "1997-01-14", "1997-01-15", "1997-01-16", 
"1997-01-17", "1997-01-18", "1997-01-19", "1997-01-20"), class = "factor"), 
    Tmax = c(4.4, 5.8, 4, 1.9, -3, -4.5, -5.2, 1.4, 1.5, -2.5, 
    -3.5, 0.5, -2, -0.7, -0.6, -1.7, -0.5, -2.6, 5.8, 6.2), Tmin = c(1.5, 
    -1.7, -2.5, -4.5, -8.3, -9, -9.5, -9.4, -4.8, -7.5, -9.2, 
    -4.4, -3.8, -4.5, -7, -7, -3, -10.8, -13, -2.1), Tmean = c(2.7, 
    0.9, 1.1, -3.8, -6.8, -7.2, -7.3, -3.1, -3.8, -6.3, -5.6, 
    -1.2, -2.8, -2.2, -4.7, -2.5, -2.1, -7.9, 1.6, 2.4), Rainfall = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0.1, 0, NA, 0.4, 2.8, 8.7, 3.9, 1.9, 
    15.2, 1.2, NA, 0.2), Humidity = c(80L, 79L, 79L, 83L, 84L, 
    81L, 83L, 84L, 85L, 91L, 90L, 95L, 88L, 88L, 85L, 91L, 94L, 
    80L, 75L, 79L), Sunshine = c(0, 0.3, 0.3, 0.4, 0.5, 0.6, 
    0.6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.1, 0, 0), Cloud = c(5.8, 
    1.4, 3.2, 2.2, 2, 0.1, 1.8, 4.2, 7.8, 6, 5.6, 8, 7.9, 8, 
    7.6, 8, 8, 4.2, 7.1, 7.9), Wind = c(2.6, 2.4, 4, 1.9, 2.5, 
    2.8, 2.8, 4.4, 4, 2.3, 2.9, 4.6, 5, 4.8, 3.2, 3.9, 7.4, 6.3, 
    9.3, 6.8), SeeLevelPressure = c(1030.5, 1030.8, 1027.8, 1025.8, 
    1024.7, 1022.1, 1019.6, 1014.4, 1022.8, 1018.6, 1006.6, 993.5, 
    990.4, 979.1, 1004.2, 1002.4, 999.2, 1013.1, 1006.3, 994)), 
class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", 
"7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20"))