如何用python读取CSV文件的头部和尾部

时间:2015-10-28 22:48:28

标签: python csv

我有一个带有时间戳字段的csv文件,其中第一行表示开始时间,最后一行指定结束时间作为时间范围。如何使用python获取它们?

CSV文件:

run,a,b,2015-10-25T18:02:30.798426Z  
run,c,d,2015-10-25T18:02:30.807375Z
run,e,f,2015-10-25T18:02:30.809113Z
run,g,h,2015-10-25T18:02:30.825410Z
run,i,j,2015-10-25T18:02:30.843917Z
run,k,l,2015-10-25T18:02:30.850492Z
run,m,n,2015-10-25T18:02:30.858041Z
run,o,p,2015-10-25T18:02:30.859345Z
run,q,r,2015-10-25T18:02:30.862365Z

感谢。

2 个答案:

答案 0 :(得分:1)

如果你已经知道线是按时间排序的,你可以这样做:

import csv
import dateutil.parser

with open('file.csv') as f: 
   reader = csv.reader(f)
   first = dateutil.parser.parse(reader.next()[3])
   for row in reader:
      pass
last = dateutil.parser.parse(row[3])

print('%s - %s' % (first, last))
# OUTPUTS: 
# 2015-10-25T18:02:30.798426Z - 2015-10-25T18:02:30.862365Z

如果您想先获得第一个并且最后返回到日期时间对象(来自isoformat),您可以使用this answer中的dateutil.parser,例如:

import dateutil.parser
first = dateutil.parser.parse(first)

答案 1 :(得分:1)

上面提供的答案有效,但涉及阅读整个文件。如果您使用的是unix系统......

# assume CSV file like
# a,b,1
# a,b,2
# a,b,3
# ...
# a,b,234934

import subprocess

# get first N lines of CSV file into array
how_many_lines_in_head = '1'
head_args = ['head', '-n', how_many_lines_in_head, 'input.csv']
head_str = subprocess.check_output(head_args)
first_timestamp = head_str.split(',')[-1].replace('\n','')

# do the same for tail end of file
how_many_lines_in_tail = '1'
tail_args = ['tail', '-n', how_many_lines_in_tail, 'input.csv']
tail_str = subprocess.check_output(tail_args)
last_timestamp = tail_str.split(',')[-1].replace('\n','')

# i'm assuming unix system here so line endings are \n