我想使用这些数据(下方)计算每周的平均值Dist
,同时保留使用POSIXct
时间等级的好处。
df <- structure(list(IndID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), class = "factor", .Label = "AAA"),
Date = structure(c(1329436800, 1329458400, 1329480000, 1329501600,
1329523200, 1329544800, 1329566400, 1329588000, 1329609600,
1329631200, 1329652800, 1329674400, 1329696000, 1329717600,
1329739200, 1329760800, 1329782400, 1329804000, 1329825600,
1329847200, 1329868800, 1329890400, 1329912000, 1329933600,
1329955200, 1329976800, 1329998400, 1330020000, 1330041600,
1330063200, 1330084800, 1330106400, 1330128000, 1330149600,
1330171200, 1330192800, 1330214400, 1330236000, 1330257600,
1330279200, 1330300800, 1330322400, 1330344000, 1330365600,
1330387200, 1330408800, 1330430400, 1330452000, 1330473600,
1330495200), class = c("POSIXct", "POSIXt"), tzone = ""),
Dist = c(3.85567120344727, 52.2649622620809, 1043.61207930222,
1352.58506343616, 176.911523081261, 77.8266318470078, 50.3943567710686,
296.753649985307, 70.5826583995618, 166.394264991861, 251.745346701973,
295.70655057823, 44.6664731663839, 11.1539274078084, 124.578071475754,
757.728373470112, 83.0921234152083, 36.6820839851181, 29.1406161870034,
150.442928003814, 66.0957159105813, 2.23839297570488, 184.88312900824,
513.072526047611, 132.868335201626, 8.09274857805967, 284.479977841835,
479.358187122796, 297.273840894826, 4.00676616275076, 601.492189218489,
249.001525522847, 108.007775719885, 2.38435966274261, 604.365702677913,
1499.59076416313, 111.74722960012, 25.3528529967124, 280.057754683142,
428.157539641219, 70.0365608334965, 71.0886617898624, 265.823654634254,
380.247565078552, 188.857338305481, 9.24402933768915, 120.346786301264,
221.904294953242, 201.086079767386, 81.7857577639103), DoW = c(5,
5, 6, 6, 6, 6, 7, 7, 7, 7, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3,
3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 1,
1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)), .Names = c("IndID", "Date",
"Dist", "DoW"), row.names = c(NA, -50L), class = "data.frame")
> head(df)
IndID Date Dist DoW
1 AAA 2012-02-16 17:00:00 3.855671 5
2 AAA 2012-02-16 23:00:00 52.264962 5
3 AAA 2012-02-17 05:00:00 1043.612079 6
4 AAA 2012-02-17 11:00:00 1352.585063 6
5 AAA 2012-02-17 17:00:00 176.911523 6
6 AAA 2012-02-17 23:00:00 77.826632 6
我的想法是使用plyr
包按周平均Dist
,并希望首先创建一个新的WeekDate
字段,其中包含第一天的日期(不包括时间)每周如DoW(星期几)字段中所示,数据并不总是在一周的第一天开始。
虽然我似乎无法连接点,但我希望每个连续周(DoW 1-7)的最小日期不包括h:m:s。
行1:10将是2012-02-16, 行11:38将是2012-02-19, 第39:50行将是2012-02-26
我怀疑lubridate
包会有用,但无法正确获取代码。
有关特定创建新日期列的任何建议或替代方法,或者更广泛地平均每周的Dist,我们将不胜感激。
答案 0 :(得分:10)
使用 还有plyr
的library(lubridate)
library(dplyr)
df %>%
group_by(Week = floor_date(Date, unit="week")) %>%
summarize(WeeklyAveDist=mean(Dist))
#Source: local data frame [3 x 2]
#
# Week WeeklyAveDist
#1 2012-02-12 381.7755
#2 2012-02-19 252.1116
#3 2012-02-26 175.4097
ceiling_date
,round_date
选项。
答案 1 :(得分:3)
您可以strftime
使用%W
格式:
> strftime(as.Date("2015-01-08"), "%W")
[1] "01"
您可以使用它来定义新变量,然后通过此变量进行聚合。也许这样
> df <- transform(df, week=strftime(Date, "%W"))
> aggregate(df$Dist, by=list(df$week), FUN=mean)
Group.1 x
1 07 319.8861
2 08 254.2861
3 09 161.0421