我有这样的日期矢量:
1 2014-03-10 22:54:24
2 2014-03-10 22:53:16
3 2014-03-10 22:53:01
4 2014-03-10 22:52:38
5 2014-03-10 22:52:00
6 2014-03-01 01:13:08
7 2014-03-01 01:11:30
8 2014-03-01 01:07:41
9 2014-03-01 01:05:28
10 2014-03-01 00:58:40
11 2014-03-27 18:11:57
我如何按月,日,早,下或周分组?例如:
month sum
2014-3 11
==============
week sum
2014-3-1 5
2014-3-9 5
==============
2014-3-1
morning sum
2014-3-1 5
答案 0 :(得分:1)
使用包data.table
并了解课程POSIXlt
。
#x is assumed to be you're vector of time objects (POSIXct POSIXlt).
# The following lines are just for getting known to POSIXlt. You do not need to run these.
Secs <- as.POSIXlt(x)[[1]]
Mins <- as.POSIXlt(x)[[2]]
# ...
Month <- as.POSIXlt(x)[[5]] + 1 # months do start with 0 instead of 1
Year <- as.POSIXlt(x)[[6]] - 100 #for 2016 the result would be 116 ...
DayOfYear <- as.POSIXlt(x)[[9]] + 1 #starts with 0
您可以类似地计算更复杂的值。立即使用data.table
。
require(data.table)
X <- as.data.table(x) # creates a data.table object
setnames(X, "Time") # names the 1 column 'Time'
X[ , month := as.POSIXlt(Time)[[5]] + 1] #adds a column month
X[ , doy:= as.POSIXlt(Time)[[8]] + 1] #adds a column day of year
#....
现在,您可以将data.table分组为:
X[ , .N, by = doy]
X[ , .N, by = month]
# ...
.N
返回每个组中的项目数。您还可以组合分组:
X[ , .N, by = list(doy, month)]
有很多很好的教程使用data.table
,分组和评估类似于sql语法(也可以在教程中找到)。
一个很好的链接开始是开发人员的常见问题解答:
http://datatable.r-forge.r-project.org/datatable-faq.pdf
编辑:
当然,您还可以为下午和早上制作更复杂的列:
X[ , afternoon:= ifelse(as.POSIXlt(x)[[3]] > 12, TRUE, FALSE)]
答案 1 :(得分:1)
假设您有一个这样的数据框,其中time
为POSIXct格式:
df
time
1 2014-03-10 22:54:24
2 2014-03-10 22:53:16
3 2014-03-10 22:53:01
4 2014-03-10 22:52:38
5 2014-03-10 22:52:00
6 2014-03-01 01:13:08
7 2014-03-01 01:11:30
8 2014-03-01 01:07:41
9 2014-03-01 01:05:28
10 2014-03-01 00:58:40
11 2014-03-27 18:11:57
您可以按如下方式获取月,周和上午/下午:
df$month <- format(df$time, '%Y-%m')
df$week <- format(df$time, '%Y-%U')
df$ampm <- ifelse(as.numeric(format(df$time, '%H')) > 12, 'pm', 'am')
df
time month week ampm
1 2014-03-10 22:54:24 2014-03 2014-10 pm
2 2014-03-10 22:53:16 2014-03 2014-10 pm
3 2014-03-10 22:53:01 2014-03 2014-10 pm
4 2014-03-10 22:52:38 2014-03 2014-10 pm
5 2014-03-10 22:52:00 2014-03 2014-10 pm
6 2014-03-01 01:13:08 2014-03 2014-08 am
7 2014-03-01 01:11:30 2014-03 2014-08 am
8 2014-03-01 01:07:41 2014-03 2014-08 am
9 2014-03-01 01:05:28 2014-03 2014-08 am
10 2014-03-01 00:58:40 2014-03 2014-08 am
11 2014-03-27 18:11:57 2014-03 2014-12 pm
然后,您可以使用库dplyr
来获取摘要:
library(dplyr)
count(df, month)
Source: local data frame [1 x 2]
month n
(chr) (int)
1 2014-03 11
count(df, week)
Source: local data frame [3 x 2]
week n
(chr) (int)
1 2014-08 5
2 2014-10 5
3 2014-12 1
count(df, ampm)
Source: local data frame [2 x 2]
ampm n
(chr) (int)
1 am 5
2 pm 6