我正在对Citibike数据进行分析,并在帮助下创建了一个完美运行的代码 - 它提取了从自行车最后停止的不同站点开始的所有自行车行程(即所有{{ {1}}与之前的start.station.id
不同,这意味着自行车是用卡车移动的。然而,月度数据集非常大,夏季月份包含超过100万次单独旅行(你可以找到它们)在这里:Citibike data。
以下是数据集的快照:
head(Nov2015, n = 20) tripduration starttime stoptime start.station.id start.station.name 1 1110 11/1/2015 00:00:00 11/1/2015 00:18:31 537 Lexington Ave & E 24 St 2 1094 11/1/2015 00:00:01 11/1/2015 00:18:15 537 Lexington Ave & E 24 St 3 520 11/1/2015 00:00:05 11/1/2015 00:08:45 536 1 Ave & E 30 St 4 753 11/1/2015 00:00:15 11/1/2015 00:12:48 229 Great Jones St 5 353 11/1/2015 00:00:22 11/1/2015 00:06:15 285 Broadway & E 14 St 6 1285 11/1/2015 00:00:22 11/1/2015 00:21:48 268 Howard St & Centre St 7 477 11/1/2015 00:00:25 11/1/2015 00:08:23 379 W 31 St & 7 Ave 8 362 11/1/2015 00:00:28 11/1/2015 00:06:30 407 Henry St & Poplar St 9 2316 11/1/2015 00:00:37 11/1/2015 00:39:14 147 Greenwich St & Warren St 10 627 11/1/2015 00:00:42 11/1/2015 00:11:10 521 8 Ave & W 31 St 11 2304 11/1/2015 00:00:44 11/1/2015 00:39:08 147 Greenwich St & Warren St 12 1471 11/1/2015 00:01:04 11/1/2015 00:25:35 281 Grand Army Plaza & Central Park S 13 1484 11/1/2015 00:01:36 11/1/2015 00:26:21 281 Grand Army Plaza & Central Park S 14 284 11/1/2015 00:01:36 11/1/2015 00:06:20 247 Perry St & Bleecker St 15 886 11/1/2015 00:01:39 11/1/2015 00:16:25 492 W 33 St & 7 Ave 16 886 11/1/2015 00:01:42 11/1/2015 00:16:28 492 W 33 St & 7 Ave 17 1379 11/1/2015 00:01:44 11/1/2015 00:24:44 512 W 29 St & 9 Ave 18 179 11/1/2015 00:01:47 11/1/2015 00:04:47 319 Fulton St & Broadway 19 309 11/1/2015 00:01:51 11/1/2015 00:07:00 160 E 37 St & Lexington Ave 20 616 11/1/2015 00:02:08 11/1/2015 00:12:24 479 9 Ave & W 45 St start.station.latitude start.station.longitude end.station.id end.station.name 1 40.74026 -73.98409 531 Forsyth St & Broome St 2 40.74026 -73.98409 531 Forsyth St & Broome St 3 40.74144 -73.97536 498 Broadway & W 32 St 4 40.72743 -73.99379 328 Watts St & Greenwich St 5 40.73455 -73.99074 151 Cleveland Pl & Spring St 6 40.71911 -73.99973 476 E 31 St & 3 Ave 7 40.74916 -73.99160 546 E 30 St & Park Ave S 8 40.70047 -73.99145 310 State St & Smith St 9 40.71542 -74.01122 441 E 52 St & 2 Ave 10 40.75097 -73.99444 285 Broadway & E 14 St 11 40.71542 -74.01122 441 E 52 St & 2 Ave 12 40.76440 -73.97371 367 E 53 St & Lexington Ave 13 40.76440 -73.97371 367 E 53 St & Lexington Ave 14 40.73535 -74.00483 453 W 22 St & 8 Ave 15 40.75020 -73.99093 377 6 Ave & Canal St 16 40.75020 -73.99093 377 6 Ave & Canal St 17 40.75007 -73.99839 445 E 10 St & Avenue A 18 40.71107 -74.00945 264 Maiden Ln & Pearl St 19 40.74824 -73.97831 362 Broadway & W 37 St 20 40.76019 -73.99126 440 E 45 St & 3 Ave end.station.latitude end.station.longitude bikeid usertype birth.year gender 1 40.71894 -73.99266 22545 Subscriber 1981 2 2 40.71894 -73.99266 23959 Subscriber 1980 1 3 40.74855 -73.98808 22251 Subscriber 1988 1 4 40.72406 -74.00966 15869 Subscriber 1981 1 5 40.72210 -73.99725 21645 Subscriber 1987 1 6 40.74394 -73.97966 14788 Customer NA 0 7 40.74445 -73.98304 21128 Subscriber 1962 2 8 40.68927 -73.98913 21016 Subscriber 1978 1 9 40.75601 -73.96742 24117 Subscriber 1988 2 10 40.73455 -73.99074 17048 Subscriber 1986 2 11 40.75601 -73.96742 18241 Subscriber 1984 1 12 40.75828 -73.97069 24223 Customer NA 0 13 40.75828 -73.97069 16779 Customer NA 0 14 40.74475 -73.99915 17272 Subscriber 1976 1 15 40.72244 -74.00566 15008 Subscriber 1981 1 16 40.72244 -74.00566 23019 Subscriber 1982 1 17 40.72741 -73.98142 23843 Subscriber 1962 2 18 40.70706 -74.00732 22538 Subscriber 1981 1 19 40.75173 -73.98754 22042 Subscriber 1988 1 20 40.75255 -73.97283 22699 Subscriber 1982 1
我用来提取“隐藏的”自行车运动的代码并将它们放入一个连贯的data.frame:
end.station.id
因为它使用for循环,所以提取需要很长时间。有没有办法加快这个过程或以避免使用for循环的方式重写代码?
输出应保持完全相同。即:
head(output) bikeid end.station.id start.station.id diff.time stoptime starttime 1 22545 520 529 24.8166666666667 11/2/2015 08:38:22 11/2/2015 09:03:11 2 22545 520 517 537.483333333333 11/2/2015 09:39:19 11/2/2015 18:36:48 3 22545 2004 3230 563.066666666667 11/2/2015 22:06:27 11/3/2015 07:29:31 4 22545 296 3236 471.783333333333 11/4/2015 23:40:29 11/5/2015 07:32:16 5 22545 520 449 43.4166666666667 11/9/2015 08:24:06 11/9/2015 09:07:31 6 22545 359 519 30.7166666666667 11/9/2015 09:14:46 11/9/2015 09:45:29
答案 0 :(得分:0)
使用data.table的此解决方案需要几分钟时间。
使用shift函数,并通过bikeid将当前行的数据添加到现有data.table。
然后我们过滤!is.na(end.station.id)&(end.station.id!= start.station.id),然后删除不需要的列,然后设置列顺序。
Excel = actxGetRunningServer('excel.application');
set(Excel, 'Visible', 1);
Workbooks = Excel.Workbooks;
Workbook = Excel.Workbooks.Open('C:\Users\...test.xlsx');
curr_sheet = get(Workbook,'ActiveSheet');
rngObj = ('A1:C3')
rngObj.Copy
Sheets = Excel.ActiveWorkBook.Sheets;
new_sheet = Sheets.Add;
new_sheet.PasteSpecial; %This is where I am stuck!