Python:在2个日期之间解析CSV数据并按升序打印:

时间:2018-07-20 13:54:00

标签: python csv datetime parsing

我对Python还是很陌生,所以如果这个问题可能是简单的解决或错误,请原谅。如果您看下面的代码,我正在尝试从CSV文件中解析数据。特别是,我试图解析升序的两个日期之间创建的用户。在这两个日期之间创建的任何用户,都应按升序打印。我的日期列row[1]以Unix时间显示。还应该打印一个单词列row[8]。目的是按升序分析日期时,打印的单词列row[8]形成特定的短语。问题是当我按照Pycharm中当前的代码执行代码时,在第15行IndexError: list out of range处收到creation_date = date.fromtimestamp(int(row[1]))。我知道Panda可以更好地处理CSV文件,但是我试图避免为这一任务学习Panda。

import csv
from datetime import datetime, date
import sys

start_date = date(2014, 6, 22)
end_date = date(2014, 7, 22)

# Read csv data into memory filtering rows by the date in column 2 (row[1]).
csv_data = []
with open('sample.csv', newline='') as f:
reader = csv.reader(f, delimiter='\t')
header = next(reader)
csv_data.append(header)
for row in reader:
    creation_date = date.fromtimestamp(int(row[1]))
    if start_date <= creation_date <= end_date:
        csv_data.append(row)

if csv_data:  # Anything found?
# Print the results in ascending date order.
print(" ".join(csv_data[0]))
# Converting the timestamp to int may not be necessary (but doesn't hurt)
for row in sorted(csv_data[1:], key=lambda r: int(r[1])): 
    print(" ".join(row))

2 个答案:

答案 0 :(得分:1)

您正在尝试访问的数据似乎不在该行中的值(因为该行只有一个值)。 您可以将崩溃的代码包装在try/except中,然后查看失败的行:

for row in reader: 
    try:
        creation_date = date.fromtimestamp(int(row[1]))
    except IndexError:
        print("Cannot get value for row: {}".format(row))
        continue

    if start_date <= creation_date <= end_date:
        csv_data.append(row)

这应该让您初步了解为什么它在这里崩溃(也许您的数据不是用制表符分隔的?)

答案 1 :(得分:0)

您共享的csv,分隔。所以当你说

  reader = csv.reader(f, delimiter='\t') // returns a single column

您应该将其替换为

reader = csv.reader(f, delimiter=',')

实际代码:

import csv
from datetime import datetime, date
import sys

start_date = date(2014, 6, 22)
end_date = date(2014, 7, 22)

# Read csv data into memory filtering rows by the date in column 2 (row[1]).
csv_data = []
with open('sample_data.csv','r') as f:
 reader = csv.reader(f, delimiter='\t')
 header = next(reader)
 csv_data.append(header)
 for row in reader:
    creation_date = date.fromtimestamp(int(row[1]))
    if start_date <= creation_date <= end_date:
        csv_data.append(row)

 if csv_data:  # Anything found?
    # Print the results in ascending date order.
    print(" ".join(csv_data[0]))
    # Converting the timestamp to int may not be necessary (but doesn't hurt)
    for row in sorted(csv_data[1:], key=lambda r: int(r[1])): 
        print(" ".join(row))