如何按月,日,早,下分组并用R记录总和?

时间:2016-05-02 10:42:17

标签: r group-by statistics time-series

我有这样的日期矢量:

 1 2014-03-10 22:54:24
 2 2014-03-10 22:53:16
 3 2014-03-10 22:53:01
 4 2014-03-10 22:52:38
 5 2014-03-10 22:52:00
 6 2014-03-01 01:13:08
 7 2014-03-01 01:11:30
 8 2014-03-01 01:07:41
 9 2014-03-01 01:05:28
10 2014-03-01 00:58:40
11 2014-03-27 18:11:57

我如何按月,日,早,下或周分组?例如:

month     sum      
2014-3     11

==============

week      sum       
2014-3-1   5  
2014-3-9   5

==============

2014-3-1 
morning   sum 
2014-3-1  5  

2 个答案:

答案 0 :(得分:1)

使用包data.table并了解课程POSIXlt

#x is assumed to be you're vector of time objects (POSIXct POSIXlt).
# The following lines are just for getting known to POSIXlt. You do not need to run these.
Secs <- as.POSIXlt(x)[[1]]
Mins <- as.POSIXlt(x)[[2]]
# ...
Month <- as.POSIXlt(x)[[5]] + 1 # months do start with 0 instead of 1
Year  <- as.POSIXlt(x)[[6]] - 100 #for 2016 the result would be 116 ...
DayOfYear <- as.POSIXlt(x)[[9]] + 1 #starts with 0

您可以类似地计算更复杂的值。立即使用data.table

require(data.table)
X <- as.data.table(x) # creates a data.table object
setnames(X, "Time")   # names the 1 column 'Time'
X[ , month := as.POSIXlt(Time)[[5]] + 1] #adds a column month
X[ , doy:= as.POSIXlt(Time)[[8]] + 1] #adds a column day of year
#....

现在,您可以将data.table分组为:

X[ , .N, by = doy]
X[ , .N, by = month]
# ...

.N返回每个组中的项目数。您还可以组合分组:

X[ , .N, by = list(doy, month)]    

有很多很好的教程使用data.table,分组和评估类似于sql语法(也可以在教程中找到)。 一个很好的链接开始是开发人员的常见问题解答: http://datatable.r-forge.r-project.org/datatable-faq.pdf

编辑:

当然,您还可以为下午和早上制作更复杂的列:

X[ , afternoon:= ifelse(as.POSIXlt(x)[[3]] > 12, TRUE, FALSE)]

答案 1 :(得分:1)

假设您有一个这样的数据框,其中time为POSIXct格式:

df
                   time
1   2014-03-10 22:54:24
2   2014-03-10 22:53:16
3   2014-03-10 22:53:01
4   2014-03-10 22:52:38
5   2014-03-10 22:52:00
6   2014-03-01 01:13:08
7   2014-03-01 01:11:30
8   2014-03-01 01:07:41
9   2014-03-01 01:05:28
10  2014-03-01 00:58:40
11  2014-03-27 18:11:57

您可以按如下方式获取月,周和上午/下午:

df$month <- format(df$time, '%Y-%m')
df$week <- format(df$time, '%Y-%U')
df$ampm <- ifelse(as.numeric(format(df$time, '%H')) > 12, 'pm', 'am')
df
                  time   month    week ampm
1  2014-03-10 22:54:24 2014-03 2014-10   pm
2  2014-03-10 22:53:16 2014-03 2014-10   pm
3  2014-03-10 22:53:01 2014-03 2014-10   pm
4  2014-03-10 22:52:38 2014-03 2014-10   pm
5  2014-03-10 22:52:00 2014-03 2014-10   pm
6  2014-03-01 01:13:08 2014-03 2014-08   am
7  2014-03-01 01:11:30 2014-03 2014-08   am
8  2014-03-01 01:07:41 2014-03 2014-08   am
9  2014-03-01 01:05:28 2014-03 2014-08   am
10 2014-03-01 00:58:40 2014-03 2014-08   am
11 2014-03-27 18:11:57 2014-03 2014-12   pm

然后,您可以使用库dplyr来获取摘要:

library(dplyr)

count(df, month)
Source: local data frame [1 x 2]

    month     n
    (chr) (int)
1 2014-03    11

count(df, week)
Source: local data frame [3 x 2]

     week     n
    (chr) (int)
1 2014-08     5
2 2014-10     5
3 2014-12     1

count(df, ampm)
Source: local data frame [2 x 2]

   ampm     n
  (chr) (int)
1    am     5
2    pm     6