getWeekdayShort(int weekday){
DateTime date = DateTime.now();
return DateFormat('E').format(date);
}
DateTime, units
2019-04-04 13:44:48, 15
2019-04-05 13:44:49, 95
2019-04-06 13:44:50, 16
2019-04-07 13:44:51, 23
2019-04-09 13:44:53, 17
2019-04-10 13:44:53, 54
2019-04-11 13:44:53, 14
2019-04-12 13:44:53, 53
2019-04-13 13:44:53, 82
2019-04-14 13:44:53, 25
2019-04-15 13:44:53, 66
2019-04-16 13:44:53, 2
2019-04-17 13:44:53, 44
2019-04-18 13:44:53, 85
2019-04-19 13:44:53, 28
2019-04-20 13:44:53, 20
2019-04-21 13:44:53, 99
2019-04-22 13:44:53, 41
2019-04-23 13:44:53, 3
2019-04-24 13:44:53, 36
2019-04-25 13:44:53, 26
2019-04-26 13:44:53, 30
我有一个较大的csv文件(> 5GB)以及开始日期和结束日期列表。我想根据开始日期和结束日期列表在数据框中选择行。结束日期和开始日期不重叠。
对于上面的样本,结果将是
Start, End
2019-04-01 00:00:00, 2019-04-06 00:00:00
2019-04-09 00:00:00, 2019-04-11 00:00:00
2019-04-18 00:00:00, 2019-04-21 00:00:00
我可以使用for循环来做到这一点,但如果可能的话,希望有一些更有效的方法。
答案 0 :(得分:0)
我建议您使用pandas
。首先,您需要获取数据:
import pandas as pd
import datetime as dt
df = pd.read_csv("path-to-your-csv-file/yourfile.csv") # read your file to df
start = "2019-04-07 00:00:00" # example start date string converted
end = "2019-04-11 00:00:00" # example ending date string
to_datetime = lambda x: dt.datetime.strptime(x, "%Y-%m-%d %H:%M:%S") # format a string to a datetime object
df['DateTime'] = df.DateTime.apply(to_datetime) #convert the column entries from strings to datetime objects
# convert start and end date strings to date time objects
start = to_datetime(start)
end = to_datetime(end)
几乎在任何情况下都需要to_datetime
函数,因为让DateTime
列保存datetime对象确实很方便。最简单的情况是,您不在乎时间,而我们认为日期是有效的:
df.DateTime = df.DateTime.dt.date # get rid of the timestamps
start_index = df[df.DateTime == start.date()].index[0] # get the index of the first column where DateTime == start
end_index = df[df.DateTime == end.date()].index[-1] # get the index of the last column where DateTime == end
target = df[start_index:end_index + 1] # save a subset of df matching your criteria to target
如果日期无效(例如, eg ,因为您想有几个小时,但无法精确指定它们),则可以使用searchsorted
来获取索引:
start_index = df.DateTime.searchsorted(start)[0] # get the first index where DateTime is closest to start
end_index = df.DateTime.searchsorted(end)[-1] # get the latest index where DateTime is closest to end
target = df[start_index:end_index + 1] # save a subset of df matching your criteria to target
不过,在使用searchsorted
时要小心,不要忘记省略删除时间戳的步骤,并确保DateTime
列已排序。最后,由于"2019-04-07 23:55:00"
将更接近"2019-04-09 13:00:00"
,而"2019-04-07 10:55:00"
将更接近"2019-04-07 13:00:00"
-换句话说,时间戳很重要。