我有一个带有时间戳字段的csv文件,其中第一行表示开始时间,最后一行指定结束时间作为时间范围。如何使用python获取它们?
CSV文件:
run,a,b,2015-10-25T18:02:30.798426Z
run,c,d,2015-10-25T18:02:30.807375Z
run,e,f,2015-10-25T18:02:30.809113Z
run,g,h,2015-10-25T18:02:30.825410Z
run,i,j,2015-10-25T18:02:30.843917Z
run,k,l,2015-10-25T18:02:30.850492Z
run,m,n,2015-10-25T18:02:30.858041Z
run,o,p,2015-10-25T18:02:30.859345Z
run,q,r,2015-10-25T18:02:30.862365Z
感谢。
答案 0 :(得分:1)
如果你已经知道线是按时间排序的,你可以这样做:
import csv
import dateutil.parser
with open('file.csv') as f:
reader = csv.reader(f)
first = dateutil.parser.parse(reader.next()[3])
for row in reader:
pass
last = dateutil.parser.parse(row[3])
print('%s - %s' % (first, last))
# OUTPUTS:
# 2015-10-25T18:02:30.798426Z - 2015-10-25T18:02:30.862365Z
如果您想先获得第一个并且最后返回到日期时间对象(来自isoformat),您可以使用this answer中的dateutil.parser
,例如:
import dateutil.parser
first = dateutil.parser.parse(first)
答案 1 :(得分:1)
上面提供的答案有效,但涉及阅读整个文件。如果您使用的是unix系统......
# assume CSV file like
# a,b,1
# a,b,2
# a,b,3
# ...
# a,b,234934
import subprocess
# get first N lines of CSV file into array
how_many_lines_in_head = '1'
head_args = ['head', '-n', how_many_lines_in_head, 'input.csv']
head_str = subprocess.check_output(head_args)
first_timestamp = head_str.split(',')[-1].replace('\n','')
# do the same for tail end of file
how_many_lines_in_tail = '1'
tail_args = ['tail', '-n', how_many_lines_in_tail, 'input.csv']
tail_str = subprocess.check_output(tail_args)
last_timestamp = tail_str.split(',')[-1].replace('\n','')
# i'm assuming unix system here so line endings are \n