来自data.frame的子集,基于日期时间列

时间:2016-05-19 17:56:04

标签: date dataframe subset

我有以下data.frame,大约有1800万条记录

        Gender Age Bici DepartingSta        DateTimeDepa ArrivingSta        DateTimeArri TravelTime
    1      M  28   69           85 2010-02-16 12:42:32          85 2010-02-16 12:45:37        3.1
    2      M  30   11           85 2010-02-16 12:53:29          26 2010-02-16 13:22:23       28.9
    3      M  37   43           85 2010-02-16 13:21:46          13 2010-02-16 13:49:47       28.0
    4      M  37  826           22 2010-02-16 14:06:40          85 2010-02-16 14:23:13       16.6
    5      M  19  662           27 2010-02-16 15:31:15          74 2010-02-16 16:29:17       58.0
    6      F  25    8           85 2010-02-16 16:31:53          20 2010-02-16 16:49:26       17.6
17919307      F  26 2760          121 2015-01-30 23:58:33         106 2015-01-31 00:22:08       23.6
17919308      M  22 4077           71 2015-01-30 23:58:50         190 2015-01-31 00:13:24       14.6
17919309      M  32  699          154 2015-01-30 23:58:55         165 2015-01-31 00:02:25        3.5
17919310      F  26 4044           64 2015-01-30 23:59:20          50 2015-01-31 00:05:38        6.3
17919311      M  26 3114           26 2015-01-30 23:59:23         127 2015-01-31 00:12:29       13.1
17919312      M  25 4115          165 2015-01-30 23:59:55          73 2015-01-31 00:12:39       12.7

我想从2015年1月起为subset行程编写一个函数。输入为"201501",结果为

 Gender Age Bici DepartingSta        DateTimeDepa ArrivingSta        DateTimeArri TravelTime
17919307      F  26 2760          121 2015-01-30 23:58:33         106 2015-01-31 00:22:08       23.6
17919308      M  22 4077           71 2015-01-30 23:58:50         190 2015-01-31 00:13:24       14.6
17919309      M  32  699          154 2015-01-30 23:58:55         165 2015-01-31 00:02:25        3.5
17919310      F  26 4044           64 2015-01-30 23:59:20          50 2015-01-31 00:05:38        6.3
17919311      M  26 3114           26 2015-01-30 23:59:23         127 2015-01-31 00:12:29       13.1
17919312      M  25 4115          165 2015-01-30 23:59:55          73 2015-01-31 00:12:39       12.7

2 个答案:

答案 0 :(得分:0)

根据此answer中的建议,您可以将数据集转换为xts对象,然后使用智能子集选项:

xtsdf <- xts::xts(df, order.by = df$DateTimeDepa)
xtsdf["201501"]

给出了:

#                    Gender Age  Bici   DepartingSta DateTimeDepa          #ArrivingSta
#2015-01-30 23:58:33 "F"    "26" "2760" "121"        "2015-01-30 23:58:33" "106"      
#2015-01-30 23:58:50 "M"    "22" "4077" " 71"        "2015-01-30 23:58:50" "190"      
#2015-01-30 23:58:55 "M"    "32" " 699" "154"        "2015-01-30 23:58:55" "165"      
#2015-01-30 23:59:20 "F"    "26" "4044" " 64"        "2015-01-30 23:59:20" " 50"      
#2015-01-30 23:59:23 "M"    "26" "3114" " 26"        "2015-01-30 23:59:23" "127"      
#2015-01-30 23:59:55 "M"    "25" "4115" "165"        "2015-01-30 23:59:55" " 73"      
#                    DateTimeArri          TravelTime
#2015-01-30 23:58:33 "2015-01-31 00:22:08" "23.6"    
#2015-01-30 23:58:50 "2015-01-31 00:13:24" "14.6"    
#2015-01-30 23:58:55 "2015-01-31 00:02:25" " 3.5"    
#2015-01-30 23:59:20 "2015-01-31 00:05:38" " 6.3"    
#2015-01-30 23:59:23 "2015-01-31 00:12:29" "13.1"    
#2015-01-30 23:59:55 "2015-01-31 00:12:39" "12.7" 

答案 1 :(得分:0)

以下是使用基础R format(),矢量化字符串比较和subset()来解决此问题的方法:

df <- data.frame(Gender=c('M','M','M','M','M','F','F','M','M','F','M','M'),Age=c(28L,30L,37L,37L,19L,25L,26L,22L,32L,26L,26L,25L),Bici=c(69L,11L,43L,826L,662L,8L,2760L,4077L,699L,4044L,3114L,4115L),DepartingSta=c(85L,85L,85L,22L,27L,85L,121L,71L,154L,64L,26L,165L),DateTimeDepa=as.POSIXct(c('2010-02-16 12:42:32','2010-02-16 12:53:29','2010-02-16 13:21:46','2010-02-16 14:06:40','2010-02-16 15:31:15','2010-02-16 16:31:53','2015-01-30 23:58:33','2015-01-30 23:58:50','2015-01-30 23:58:55','2015-01-30 23:59:20','2015-01-30 23:59:23','2015-01-30 23:59:55')),ArrivingSta=c(85L,26L,13L,85L,74L,20L,106L,190L,165L,50L,127L,73L),DateTimeArri=as.POSIXct(c('2010-02-16 12:45:37','2010-02-16 13:22:23','2010-02-16 13:49:47','2010-02-16 14:23:13','2010-02-16 16:29:17','2010-02-16 16:49:26','2015-01-31 00:22:08','2015-01-31 00:13:24','2015-01-31 00:02:25','2015-01-31 00:05:38','2015-01-31 00:12:29','2015-01-31 00:12:39')),TravelTime=c(3.1,28.9,28,16.6,58,17.6,23.6,14.6,3.5,6.3,13.1,12.7),row.names=c('1','2','3','4','5','6','17919307','17919308','17919309','17919310','17919311','17919312'),stringsAsFactors=F);
ym <- '201501';
df;
##          Gender Age Bici DepartingSta        DateTimeDepa ArrivingSta        DateTimeArri TravelTime
## 1             M  28   69           85 2010-02-16 12:42:32          85 2010-02-16 12:45:37        3.1
## 2             M  30   11           85 2010-02-16 12:53:29          26 2010-02-16 13:22:23       28.9
## 3             M  37   43           85 2010-02-16 13:21:46          13 2010-02-16 13:49:47       28.0
## 4             M  37  826           22 2010-02-16 14:06:40          85 2010-02-16 14:23:13       16.6
## 5             M  19  662           27 2010-02-16 15:31:15          74 2010-02-16 16:29:17       58.0
## 6             F  25    8           85 2010-02-16 16:31:53          20 2010-02-16 16:49:26       17.6
## 17919307      F  26 2760          121 2015-01-30 23:58:33         106 2015-01-31 00:22:08       23.6
## 17919308      M  22 4077           71 2015-01-30 23:58:50         190 2015-01-31 00:13:24       14.6
## 17919309      M  32  699          154 2015-01-30 23:58:55         165 2015-01-31 00:02:25        3.5
## 17919310      F  26 4044           64 2015-01-30 23:59:20          50 2015-01-31 00:05:38        6.3
## 17919311      M  26 3114           26 2015-01-30 23:59:23         127 2015-01-31 00:12:29       13.1
## 17919312      M  25 4115          165 2015-01-30 23:59:55          73 2015-01-31 00:12:39       12.7
ym;
## [1] "201501"
subset(df,format(DateTimeDepa,'%Y%m')==ym);
##          Gender Age Bici DepartingSta        DateTimeDepa ArrivingSta        DateTimeArri TravelTime
## 17919307      F  26 2760          121 2015-01-30 23:58:33         106 2015-01-31 00:22:08       23.6
## 17919308      M  22 4077           71 2015-01-30 23:58:50         190 2015-01-31 00:13:24       14.6
## 17919309      M  32  699          154 2015-01-30 23:58:55         165 2015-01-31 00:02:25        3.5
## 17919310      F  26 4044           64 2015-01-30 23:59:20          50 2015-01-31 00:05:38        6.3
## 17919311      M  26 3114           26 2015-01-30 23:59:23         127 2015-01-31 00:12:29       13.1
## 17919312      M  25 4115          165 2015-01-30 23:59:55          73 2015-01-31 00:12:39       12.7