Question

我正在读取一个csv文件，然后尝试将标题与文件的其余部分分开。 hn变量是不带第一行的读入文件。 hn_header应该是数据集中的第一行。如果仅定义这两个变量之一，则代码可以工作。如果我都定义了它们，那么后面写的一个将不包含任何数据。那怎么可能？

from csv import reader

opened_file =  open("hacker_news.csv")
read_file = reader(opened_file)
hn = list(read_file)[1:]     #this should contain all rows except the header
hn_header = list(read_file)[0] # this should be the header



print(hn[:5]) #works 
print(len(hn_header)) #empty list, does not contain the header

Answer 1

CSV阅读器只能对文件进行一次遍历，这是您第一次将其转换为列表时进行的遍历。为了避免多次迭代，可以将列表保存到变量中。

hn_list = list(read_file)
hn = hn_list[1:]
hn_header = hn_list[0]

或者您可以使用extended iterable unpacking

分割文件

hn_header, *hn = list(read_file)

Answer 2

只需在代码行以下进行更改，无需其他步骤。 read_file = list(reader(opened_file))。我希望现在您的代码可以完美运行。

阅读器对象是一个迭代器，根据定义，迭代器对象只能使用一次。当它们完成迭代后，您将一无所有。

您可以从这个Why can I only use a reader object once?问题中获得更多相关信息，也可以从该问题中获得引用。

将标题与数据集的其余部分分开

2 个答案: