从在线文件读入时使用Data.Table过滤行值?

时间:2016-12-05 21:00:35

标签: r data.table fread

阅读一些出租车数据,但我想根据它们的值过滤掉一些行,我很好奇是否可以使用data.table。在此处找到的TLC行程数据http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml 有一个标有“行程距离”的列,我想过滤掉零。可以从fread中调用它吗?

以下是我正在使用的脚本:

library(data.table)

#create your vector to feed into a reading function-----------------------
site_list = NULL
for (i in 1:3) {
  print(i)
site = paste0("https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2016-0",i,".csv") 
site_list = rbind(site_list, site)
}
site_list

#read in data--------------------------------------------
#pick your features
featurez = c("tpep_dropoff_datetime", "dropoff_longitude", "dropoff_latitude")
master_taxi = NULL
for (i in site_list) {
master = fread(i, select = featurez)
master_taxi = rbind(master_taxi, master)
}

所以上面我过滤掉了我想要的列,但是我不包括行程距离。我可以在“trip_distance”列中读到,然后在事实之后消除零,但我宁愿不浪费内存和计算能力。无论如何通过fread推出该参数?

EDIT ******

弗兰克提到我不清楚我想做什么,所以我想我下面会详细说明。我想传递一个参数:

#read in data--------------------------------------------
#pick your features
featurez = c("tpep_dropoff_datetime", "dropoff_longitude", "dropoff_latitude")
master_taxi = NULL
for (i in site_list) {
master = fread(i, select = featurez, [***variable parameter example: trip_distance >0])
master_taxi = rbind(master_taxi, master)
}

0 个答案:

没有答案