在R周将每日数据转换为每周星期六开始

时间:2015-03-13 18:41:05

标签: r dataframe time-series weekend

我无法使用一周内的平均值将每日数据转换为每周数据。

我的数据如下:

> str(daily_FWIH)
'data.frame':   4371 obs. of  6 variables:
 $ Date     : Date, format: "2013-03-01" "2013-03-02" "2013-03-04" "2013-03-05" ...
 $ CST.OUC  : Factor w/ 6 levels "BVG11","BVG12",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ CST.NAME : Factor w/ 6 levels "Central Scotland",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ SOM_patch: Factor w/ 6 levels "BVG11_Highlands & Islands",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Row_Desc : Factor w/ 1 level "FSFluidWIH": 1 1 1 1 1 1 1 1 1 1 ...
 $ Value    : num  1.16 1.99 1.47 1.15 1.16 1.28 1.27 2.07 1.26 1.19 ...

> head(daily_FWIH)
        Date CST.OUC            CST.NAME                 SOM_patch   Row_Desc Value
1 2013-03-01   BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH  1.16
2 2013-03-02   BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH  1.99
3 2013-03-04   BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH  1.47
4 2013-03-05   BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH  1.15
5 2013-03-06   BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH  1.16
6 2013-03-07   BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH  1.28

我正在尝试将此转换为xts对象,如图所示here

这就是我的尝试:

daily_FWIH$Date = as.Date(as.character(daily_FWIH$Date), "%d/%m/%Y")
library(xts)

temp.x = xts(daily_FWIH[-1], order.by=daily_FWIH$Date)
apply.weekly(temp.x, colMeans(temp.x$Value))

我有两个问题。我的周开始和结束于"星期六" ,我收到以下错误:

> apply.weekly(temp.x, colMeans(temp.x$Value))
Error in colMeans(temp.x$Value) : 'x' must be numeric

更新根据Sam的评论:

这就是我所做的:

daily_FWIH$Date <- ymd(daily_FWIH$Date) # convert to POSIX format
daily_FWIH$fakeDate <- daily_FWIH$Date + days(2)
daily_FWIH$week <- week(daily_FWIH$fakeDate) # extract week value
daily_FWIH$year <- year(daily_FWIH$fakeDate)

    > daily_FWIH %>%
+ group_by(year,week) %>%
+ mutate(weeklyAvg = mean(Value), weekStartsOn = min(Date)) %>% # create the average variable
+ slice(which(Date == weekStartsOn)) %>% # select just the first record of the week - other vars will come from this
+ select(-Value,-fakeDate,-week,-year,-Date, -CST.OUC,-CST.NAME) # drop unneeded variables
Source: local data frame [631 x 6]
Groups: year, week

   year week                   SOM_patch   Row_Desc weeklyAvg weekStartsOn
1  2013    9   BVG11_Highlands & Islands FSFluidWIH  1.048333   2013-03-01
2  2013    9   BVG12_North East Scotland FSFluidWIH  1.048333   2013-03-01
3  2013    9      BVG13_Central Scotland FSFluidWIH  1.048333   2013-03-01
4  2013    9   BVG14_South East Scotland FSFluidWIH  1.048333   2013-03-01
5  2013    9 BVG15_West Central Scotland FSFluidWIH  1.048333   2013-03-01
6  2013    9   BVG16_South West Scotland FSFluidWIH  1.048333   2013-03-01
7  2013   10   BVG11_Highlands & Islands FSFluidWIH  1.520500   2013-03-02
8  2013   10   BVG12_North East Scotland FSFluidWIH  1.520500   2013-03-02
9  2013   10      BVG13_Central Scotland FSFluidWIH  1.520500   2013-03-02
10 2013   10   BVG14_South East Scotland FSFluidWIH  1.520500   2013-03-02
..  ...  ...                         ...        ...       ...          ...

哪个不正确......

所需的输出是:

> head(desired)
        Date BVG11.Highlands_I_.A_pct BVG12.North.East.ScotlandA_pct BVG13.Central.ScotlandA_pct
1 01/03/2013                     1.16                           1.13                        1.08
2 08/03/2013                     1.41                           2.37                        1.80
3 15/03/2013                     1.33                           3.31                        1.34
4 22/03/2013                     1.39                           2.49                        1.62
5 29/03/2013                     5.06                           3.42                        1.42
6                                  NA                             NA                          NA
  BVG14.South.East.ScotlandA_pct BVG15.West.Central.ScotlandA_pct BVG16.South.West.ScotlandA_pct
1                           1.05                             0.98                           0.89
2                           1.51                             1.21                           1.07
3                           1.13                             2.13                           2.01
4                           2.14                             1.24                           1.37
5                           1.62                             1.46                           1.95
6                             NA                               NA                             NA

> str(desired)
'data.frame':   11 obs. of  7 variables:
 $ Date                            : Factor w/ 6 levels "01/03/2013",..: 2 3 4 5 6 1 1 1 1 1 ...
 $ BVG11.Highlands_I_.A_pct        : num  1.16 1.41 1.33 1.39 5.06  ...
 $ BVG12.North.East.ScotlandA_pct  : num  1.13 2.37 3.31 2.49 3.42  ...
 $ BVG13.Central.ScotlandA_pct     : num  1.08 1.8 1.34 1.62 1.42  ...
 $ BVG14.South.East.ScotlandA_pct  : num  1.05 1.51 1.13 2.14 1.62  ...
 $ BVG15.West.Central.ScotlandA_pct: num  0.98 1.21 2.13 1.24 1.46 ...
 $ BVG16.South.West.ScotlandA_pct  : num  0.89 1.07 2.01 1.37 1.95 ...

1 个答案:

答案 0 :(得分:2)

查找数据中的第一个星期六,然后根据以下内容为数据集中的所有日期指定星期ID:

library(lubridate) # for the wday() and ymd() functions
daily_FWIH$Date <- ymd(daily_FWIH$Date)
saturdays <- daily_FWIH[wday(daily_FWIH$Date) == 7, ] # filter for Saturdays
startDate <- min(saturdays$Date) # select first Saturday
daily_FWIH$week <- floor(as.numeric(difftime(daily_FWIH$Date, startDate, units = "weeks")))

一旦你有一个weekID-starting-on-Saturday变量,这是一个标准的R问题。您可以使用calculating means within a subgroup选择的方法计算每周平均值。我喜欢dplyr

library(dplyr)
daily_FWIH %>%
  group_by(week, SOM_patch) %>% # use your grouping variables in addition to week
  summarise(weeklyAvg = mean(Value), weekBeginDate = min(Date)) %>%
  mutate(firstDayOfWeek = wday(weekBeginDate, label=TRUE)) # confirm correct week cuts

Source: local data frame [2 x 5]
Groups: week

  week                 SOM_patch weeklyAvg weekBeginDate firstDayOfWeek
1   -1 BVG11_Highlands & Islands      1.16    2013-03-01            Fri
2    0 BVG11_Highlands & Islands      1.41    2013-03-02            Sat

根据以下评论进行更新:

如果要查看数据集中的其他值,则需要确定在一周内的每日值发生冲突时如何选择或计算每周值。在您的示例数据中,它们在所有行中都是相同的,因此我只是从包含一周中第一天的行中绘制它们。

library(dplyr)
daily_FWIH %>%
  group_by(week, SOM_patch) %>% # use your grouping variables
  mutate(weeklyAvg = mean(Value), weekBeginDate = min(Date)) %>%
  slice(which(Date == weekBeginDate)) %>% # select just the first record of the week - other vars will come from this 
  select(-Value, -Date) # drop unneeded variables

Source: local data frame [2 x 7]
Groups: week, SOM_patch

  CST.OUC            CST.NAME                 SOM_patch   Row_Desc week weeklyAvg weekBeginDate
1   BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH   -1      1.16    2013-03-01
2   BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH    0      1.41    2013-03-02