我希望(算术上)平均每日数据,从而将我的每日时间序列转换为每周一次。
关注此主题:How does one compute the mean of weekly data by column using R?,我正在使用 xts 库。
# Averages daily time series into weekly time series
# where my source is a zoo object
source.w <- apply.weekly(source, colMeans)
我遇到的问题是星期二通过下周一的星期一数据对该系列进行平均。
我正在寻找从星期一到星期五平均每日数据的选项。
任何提示?
还有一点:
# here is part of my data, from a "blé colza.txt" file
24/07/2012 250.5 499
23/07/2012 264.75 518.25
20/07/2012 269.25 525.25
19/07/2012 267 522.5
18/07/2012 261.25 517
17/07/2012 265.75 522.25
16/07/2012 264.25 523.25
13/07/2012 258.25 517
12/07/2012 253.75 513
11/07/2012 246.25 512.75
10/07/2012 248 515
09/07/2012 247 519.25
06/07/2012 243.25 508.25
05/07/2012 245 508.5
04/07/2012 236 500.5
03/07/2012 234 497.75
02/07/2012 234.25 489.75
29/06/2012 229 490.25
28/06/2012 229.75 487.25
27/06/2012 229.75 493
26/06/2012 226.5 486
25/06/2012 220 482.25
22/06/2012 214.25 472.5
21/06/2012 212 469.5
20/06/2012 210.25 473.75
19/06/2012 208 472.75
18/06/2012 206.75 462.5
15/06/2012 203 456.5
14/06/2012 205.25 460.5
13/06/2012 205.25 465.25
12/06/2012 205.25 469
11/06/2012 208 471.5
08/06/2012 208 468.5
07/06/2012 208 471.25
06/06/2012 208 467
05/06/2012 208 458.75
04/06/2012 208 457.5
01/06/2012 208 463.5
31/05/2012 208 466.75
30/05/2012 208 468
29/05/2012 212.75 469.75
28/05/2012 212.75 469.75
25/05/2012 212.75 465.5
# Loads external libraries
library("zoo") # or require("zoo")
library("xts") # or require("xts")
# Loads data as a zoo object
source <- read.zoo("blé colza.txt", sep=",", dec=".", header=T, na.strings="NA", format="%d/%m/%Y")
# Averages daily time series into weekly time series
# https://stackoverflow.com/questions/11129562/how-does-one-compute-the-mean-of-weekly- data-by-column-using-r
source.w <- apply.weekly(source, colMeans)
答案 0 :(得分:6)
mrdwab的answer只会发挥作用,因为它们与OP共享时区(或其特征)。举例说明:
Lines <-
"24/07/2012 250.5 499
23/07/2012 264.75 518.25
20/07/2012 269.25 525.25
19/07/2012 267 522.5
18/07/2012 261.25 517
17/07/2012 265.75 522.25
16/07/2012 264.25 523.25
13/07/2012 258.25 517
12/07/2012 253.75 513
11/07/2012 246.25 512.75
10/07/2012 248 515
09/07/2012 247 519.25
06/07/2012 243.25 508.25
05/07/2012 245 508.5
04/07/2012 236 500.5
03/07/2012 234 497.75
02/07/2012 234.25 489.75
29/06/2012 229 490.25
28/06/2012 229.75 487.25
27/06/2012 229.75 493
26/06/2012 226.5 486
25/06/2012 220 482.25
22/06/2012 214.25 472.5
21/06/2012 212 469.5
20/06/2012 210.25 473.75
19/06/2012 208 472.75
18/06/2012 206.75 462.5
15/06/2012 203 456.5
14/06/2012 205.25 460.5
13/06/2012 205.25 465.25
12/06/2012 205.25 469
11/06/2012 208 471.5
08/06/2012 208 468.5
07/06/2012 208 471.25
06/06/2012 208 467
05/06/2012 208 458.75
04/06/2012 208 457.5
01/06/2012 208 463.5
31/05/2012 208 466.75
30/05/2012 208 468
29/05/2012 212.75 469.75
28/05/2012 212.75 469.75
25/05/2012 212.75 465.5"
# Get R's timezone information (from ?Sys.timezone)
tzfile <- file.path(R.home("share"), "zoneinfo", "zone.tab")
tzones <- read.delim(tzfile, row.names = NULL, header = FALSE,
col.names = c("country", "coords", "name", "comments"),
as.is = TRUE, fill = TRUE, comment.char = "#")
# Run the analysis on each timezone
out <- list()
library(xts)
for(i in seq_along(tzones$name)) {
tzn <- tzones$name[i]
Sys.setenv(TZ=tzn)
con <- textConnection(Lines)
Source <- read.zoo(con, format="%d/%m/%Y")
out[[tzn]] <- apply.weekly(Source, colMeans)
}
现在您可以运行head(out,5)
并看到某些输出因使用的时区而异:
head(out,5)
$`Europe/Andorra`
V2 V3
2012-05-27 212.75 467.625
2012-06-03 208.95 465.100
2012-06-10 208.00 467.400
2012-06-17 205.10 462.750
2012-06-24 212.90 474.150
2012-07-01 229.85 489.250
2012-07-08 241.05 506.850
2012-07-15 254.10 516.200
2012-07-22 265.60 521.050
2012-07-23 250.50 499.000
$`Asia/Dubai`
V2 V3
2012-05-27 212.75 467.625
2012-06-03 208.95 465.100
2012-06-10 208.00 467.400
2012-06-17 205.10 462.750
2012-06-24 212.90 474.150
2012-07-01 229.85 489.250
2012-07-08 241.05 506.850
2012-07-15 254.10 516.200
2012-07-22 265.60 521.050
2012-07-23 250.50 499.000
$`Asia/Kabul`
V2 V3
2012-05-27 212.75 467.625
2012-06-03 208.95 465.100
2012-06-10 208.00 467.400
2012-06-17 205.10 462.750
2012-06-24 212.90 474.150
2012-07-01 229.85 489.250
2012-07-08 241.05 506.850
2012-07-15 254.10 516.200
2012-07-22 265.60 521.050
2012-07-23 250.50 499.000
$`America/Antigua`
V2 V3
2012-05-25 212.750 465.500
2012-06-01 209.900 467.550
2012-06-08 208.000 464.600
2012-06-15 205.350 464.550
2012-06-22 210.250 470.200
2012-06-29 227.000 487.750
2012-07-06 238.500 500.950
2012-07-13 250.650 515.400
2012-07-20 265.500 522.050
2012-07-24 257.625 508.625
$`America/Anguilla`
V2 V3
2012-05-25 212.750 465.500
2012-06-01 209.900 467.550
2012-06-08 208.000 464.600
2012-06-15 205.350 464.550
2012-06-22 210.250 470.200
2012-06-29 227.000 487.750
2012-07-06 238.500 500.950
2012-07-13 250.650 515.400
2012-07-20 265.500 522.050
2012-07-24 257.625 508.625
更强大的解决方案是确保正确表示您的时区,方法是使用Sys.setenv(TZ="<yourTZ>")
全局设置,或indexTZ(Source) <- "<yourTZ>"
为每个单独的对象设置时区。
答案 1 :(得分:3)
我能够重现您的问题,您可以使用period.apply()
和自定义“端点”解决问题。
首先,您提供的数据采用其他人可以轻松阅读的格式。
temp = structure(list(V1 = structure(c(33L, 32L, 29L, 27L, 25L, 23L,
22L, 19L, 17L, 15L, 13L, 12L, 9L, 7L, 5L, 3L, 2L, 41L, 39L, 37L,
36L, 35L, 31L, 30L, 28L, 26L, 24L, 21L, 20L, 18L, 16L, 14L, 11L,
10L, 8L, 6L, 4L, 1L, 43L, 42L, 40L, 38L, 34L), .Label = c("01/06/2012",
"02/07/2012", "03/07/2012", "04/06/2012", "04/07/2012", "05/06/2012",
"05/07/2012", "06/06/2012", "06/07/2012", "07/06/2012", "08/06/2012",
"09/07/2012", "10/07/2012", "11/06/2012", "11/07/2012", "12/06/2012",
"12/07/2012", "13/06/2012", "13/07/2012", "14/06/2012", "15/06/2012",
"16/07/2012", "17/07/2012", "18/06/2012", "18/07/2012", "19/06/2012",
"19/07/2012", "20/06/2012", "20/07/2012", "21/06/2012", "22/06/2012",
"23/07/2012", "24/07/2012", "25/05/2012", "25/06/2012", "26/06/2012",
"27/06/2012", "28/05/2012", "28/06/2012", "29/05/2012", "29/06/2012",
"30/05/2012", "31/05/2012"), class = "factor"), V2 = c(250.5,
264.75, 269.25, 267, 261.25, 265.75, 264.25, 258.25, 253.75,
246.25, 248, 247, 243.25, 245, 236, 234, 234.25, 229, 229.75,
229.75, 226.5, 220, 214.25, 212, 210.25, 208, 206.75, 203, 205.25,
205.25, 205.25, 208, 208, 208, 208, 208, 208, 208, 208, 208,
212.75, 212.75, 212.75), V3 = c(499, 518.25, 525.25, 522.5, 517,
522.25, 523.25, 517, 513, 512.75, 515, 519.25, 508.25, 508.5,
500.5, 497.75, 489.75, 490.25, 487.25, 493, 486, 482.25, 472.5,
469.5, 473.75, 472.75, 462.5, 456.5, 460.5, 465.25, 469, 471.5,
468.5, 471.25, 467, 458.75, 457.5, 463.5, 466.75, 468, 469.75,
469.75, 465.5)), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA,
-43L))
我们将清理并将对象转换为xts
对象。
temp$V1 = as.Date(temp$V1, format="%d/%m/%Y")
library(xts)
temp.x = xts(temp[-1], order.by=temp$V1)
现在。我们尝试apply.weekly()
函数,但它没有给我们你想要的东西。
apply.weekly(temp.x, colMeans)
# V2 V3
# 2012-05-28 212.75 467.625
# 2012-06-04 208.95 465.100
# 2012-06-11 208.00 467.400
# 2012-06-18 205.10 462.750
# 2012-06-25 212.90 474.150
# 2012-07-02 229.85 489.250
# 2012-07-09 241.05 506.850
# 2012-07-16 254.10 516.200
# 2012-07-23 265.60 521.050
# 2012-07-24 250.50 499.000
要使用period.apply()
,您需要指定期间的终点(可以是不规则的)。在这里,我们的第一个时期只是第一个日期,从那里开始,每五天一次。剩下几天,所以我们在最后一段时间内添加nrow(temp.x)
。
ep = c(0, seq(1, nrow(temp.x), by = 5), nrow(temp.x))
period.apply(temp.x, INDEX = ep, FUN = colMeans)
# V2 V3
# 2012-05-25 212.750 465.500
# 2012-06-01 209.900 467.550
# 2012-06-08 208.000 464.600
# 2012-06-15 205.350 464.550
# 2012-06-22 210.250 470.200
# 2012-06-29 227.000 487.750
# 2012-07-06 238.500 500.950
# 2012-07-13 250.650 515.400
# 2012-07-20 265.500 522.050
# 2012-07-24 257.625 508.625
答案 2 :(得分:2)
我运行了您的示例,如果我正确理解了问题,apply.weekly
函数会将第一个星期五与数据的第一个星期一聚合在一起。我不使用xts
包,所以其他人必须提供更多的见解。我会将日期转换为日期向量,每周的星期日代表该周的每个观察。 ?strptime
总结了我用于转化的代码。
# Get the year of the first observation
start_year <- format(time(source)[1],"%Y")
# Convert this into a date for the 1st of Jan in that year.
start_date <- as.Date(strptime(paste(start_year, "1 1"), "%Y %d %m"))
# Using the difftime function determine the distance (days) since the first day of the first year.
jul_day <- as.numeric(difftime(time(source),start_date),units="days")
# Get the date of the Monday before each observation and add it to the start of the year.
mondays <- start_date + (jul_day - (jul_day-1)%%7)
# the %% calculates the remainder.
# to check that it has worked convert the mondays vector into day names.
format(mondays, "%A")
# And now you can aggregate the observations using the mondays vector.
source.w <- aggregate(source[,1:2], mondays, "mean")
答案 3 :(得分:1)
跟随约书亚乌尔里希的回答。
在我的系统(kUbuntu 12)上,以下内容未检索zone.tab文件
tzfile <- file.path(R.home("share"), "zoneinfo", "zone.tab")
但是,我能够通过
找到zone.tablocate zone.tab
出于某种原因(可能是文件权限),我无法直接指向该zone.tab文件,即写作:
tzfile <- "usr/share/zoneinfo/zone.tab"
返回:
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'usr/share/zoneinfo/zone.tab': No such file or directory
在制作zone.tab的本地副本并指向该副本后,问题已解决:
tzfile <- "~/R/zone.tab"
现在,如果您使用Google for zone.tab,您将在线找到zone.tab的副本,以防您的系统没有一个或它已损坏或其他任何内容。这是一个地方:
http://www.ietf.org/timezones/data/zone.tab
P.S。我&lt; 15所以我不能发表评论,这是我原本应该做的。
答案 4 :(得分:0)
再看看我手边的问题。
使用 xts 库直截了当。
# say you have xts object name 'dat'
ep <- endpoints(dat, on = 'weeks') #
period.apply(x = dat, INDEX = ep, FUN = mean)