我有以下data.frame,大约有1800万条记录
Gender Age Bici DepartingSta DateTimeDepa ArrivingSta DateTimeArri TravelTime
1 M 28 69 85 2010-02-16 12:42:32 85 2010-02-16 12:45:37 3.1
2 M 30 11 85 2010-02-16 12:53:29 26 2010-02-16 13:22:23 28.9
3 M 37 43 85 2010-02-16 13:21:46 13 2010-02-16 13:49:47 28.0
4 M 37 826 22 2010-02-16 14:06:40 85 2010-02-16 14:23:13 16.6
5 M 19 662 27 2010-02-16 15:31:15 74 2010-02-16 16:29:17 58.0
6 F 25 8 85 2010-02-16 16:31:53 20 2010-02-16 16:49:26 17.6
17919307 F 26 2760 121 2015-01-30 23:58:33 106 2015-01-31 00:22:08 23.6
17919308 M 22 4077 71 2015-01-30 23:58:50 190 2015-01-31 00:13:24 14.6
17919309 M 32 699 154 2015-01-30 23:58:55 165 2015-01-31 00:02:25 3.5
17919310 F 26 4044 64 2015-01-30 23:59:20 50 2015-01-31 00:05:38 6.3
17919311 M 26 3114 26 2015-01-30 23:59:23 127 2015-01-31 00:12:29 13.1
17919312 M 25 4115 165 2015-01-30 23:59:55 73 2015-01-31 00:12:39 12.7
我想从2015年1月起为subset
行程编写一个函数。输入为"201501"
,结果为
Gender Age Bici DepartingSta DateTimeDepa ArrivingSta DateTimeArri TravelTime
17919307 F 26 2760 121 2015-01-30 23:58:33 106 2015-01-31 00:22:08 23.6
17919308 M 22 4077 71 2015-01-30 23:58:50 190 2015-01-31 00:13:24 14.6
17919309 M 32 699 154 2015-01-30 23:58:55 165 2015-01-31 00:02:25 3.5
17919310 F 26 4044 64 2015-01-30 23:59:20 50 2015-01-31 00:05:38 6.3
17919311 M 26 3114 26 2015-01-30 23:59:23 127 2015-01-31 00:12:29 13.1
17919312 M 25 4115 165 2015-01-30 23:59:55 73 2015-01-31 00:12:39 12.7
答案 0 :(得分:0)
根据此answer中的建议,您可以将数据集转换为xts
对象,然后使用智能子集选项:
xtsdf <- xts::xts(df, order.by = df$DateTimeDepa)
xtsdf["201501"]
给出了:
# Gender Age Bici DepartingSta DateTimeDepa #ArrivingSta
#2015-01-30 23:58:33 "F" "26" "2760" "121" "2015-01-30 23:58:33" "106"
#2015-01-30 23:58:50 "M" "22" "4077" " 71" "2015-01-30 23:58:50" "190"
#2015-01-30 23:58:55 "M" "32" " 699" "154" "2015-01-30 23:58:55" "165"
#2015-01-30 23:59:20 "F" "26" "4044" " 64" "2015-01-30 23:59:20" " 50"
#2015-01-30 23:59:23 "M" "26" "3114" " 26" "2015-01-30 23:59:23" "127"
#2015-01-30 23:59:55 "M" "25" "4115" "165" "2015-01-30 23:59:55" " 73"
# DateTimeArri TravelTime
#2015-01-30 23:58:33 "2015-01-31 00:22:08" "23.6"
#2015-01-30 23:58:50 "2015-01-31 00:13:24" "14.6"
#2015-01-30 23:58:55 "2015-01-31 00:02:25" " 3.5"
#2015-01-30 23:59:20 "2015-01-31 00:05:38" " 6.3"
#2015-01-30 23:59:23 "2015-01-31 00:12:29" "13.1"
#2015-01-30 23:59:55 "2015-01-31 00:12:39" "12.7"
答案 1 :(得分:0)
以下是使用基础R format()
,矢量化字符串比较和subset()
来解决此问题的方法:
df <- data.frame(Gender=c('M','M','M','M','M','F','F','M','M','F','M','M'),Age=c(28L,30L,37L,37L,19L,25L,26L,22L,32L,26L,26L,25L),Bici=c(69L,11L,43L,826L,662L,8L,2760L,4077L,699L,4044L,3114L,4115L),DepartingSta=c(85L,85L,85L,22L,27L,85L,121L,71L,154L,64L,26L,165L),DateTimeDepa=as.POSIXct(c('2010-02-16 12:42:32','2010-02-16 12:53:29','2010-02-16 13:21:46','2010-02-16 14:06:40','2010-02-16 15:31:15','2010-02-16 16:31:53','2015-01-30 23:58:33','2015-01-30 23:58:50','2015-01-30 23:58:55','2015-01-30 23:59:20','2015-01-30 23:59:23','2015-01-30 23:59:55')),ArrivingSta=c(85L,26L,13L,85L,74L,20L,106L,190L,165L,50L,127L,73L),DateTimeArri=as.POSIXct(c('2010-02-16 12:45:37','2010-02-16 13:22:23','2010-02-16 13:49:47','2010-02-16 14:23:13','2010-02-16 16:29:17','2010-02-16 16:49:26','2015-01-31 00:22:08','2015-01-31 00:13:24','2015-01-31 00:02:25','2015-01-31 00:05:38','2015-01-31 00:12:29','2015-01-31 00:12:39')),TravelTime=c(3.1,28.9,28,16.6,58,17.6,23.6,14.6,3.5,6.3,13.1,12.7),row.names=c('1','2','3','4','5','6','17919307','17919308','17919309','17919310','17919311','17919312'),stringsAsFactors=F);
ym <- '201501';
df;
## Gender Age Bici DepartingSta DateTimeDepa ArrivingSta DateTimeArri TravelTime
## 1 M 28 69 85 2010-02-16 12:42:32 85 2010-02-16 12:45:37 3.1
## 2 M 30 11 85 2010-02-16 12:53:29 26 2010-02-16 13:22:23 28.9
## 3 M 37 43 85 2010-02-16 13:21:46 13 2010-02-16 13:49:47 28.0
## 4 M 37 826 22 2010-02-16 14:06:40 85 2010-02-16 14:23:13 16.6
## 5 M 19 662 27 2010-02-16 15:31:15 74 2010-02-16 16:29:17 58.0
## 6 F 25 8 85 2010-02-16 16:31:53 20 2010-02-16 16:49:26 17.6
## 17919307 F 26 2760 121 2015-01-30 23:58:33 106 2015-01-31 00:22:08 23.6
## 17919308 M 22 4077 71 2015-01-30 23:58:50 190 2015-01-31 00:13:24 14.6
## 17919309 M 32 699 154 2015-01-30 23:58:55 165 2015-01-31 00:02:25 3.5
## 17919310 F 26 4044 64 2015-01-30 23:59:20 50 2015-01-31 00:05:38 6.3
## 17919311 M 26 3114 26 2015-01-30 23:59:23 127 2015-01-31 00:12:29 13.1
## 17919312 M 25 4115 165 2015-01-30 23:59:55 73 2015-01-31 00:12:39 12.7
ym;
## [1] "201501"
subset(df,format(DateTimeDepa,'%Y%m')==ym);
## Gender Age Bici DepartingSta DateTimeDepa ArrivingSta DateTimeArri TravelTime
## 17919307 F 26 2760 121 2015-01-30 23:58:33 106 2015-01-31 00:22:08 23.6
## 17919308 M 22 4077 71 2015-01-30 23:58:50 190 2015-01-31 00:13:24 14.6
## 17919309 M 32 699 154 2015-01-30 23:58:55 165 2015-01-31 00:02:25 3.5
## 17919310 F 26 4044 64 2015-01-30 23:59:20 50 2015-01-31 00:05:38 6.3
## 17919311 M 26 3114 26 2015-01-30 23:59:23 127 2015-01-31 00:12:29 13.1
## 17919312 M 25 4115 165 2015-01-30 23:59:55 73 2015-01-31 00:12:39 12.7