Question

我正在运行Python 2.7。我是Python的新手。我正在尝试读取CSV文件（值以空格分隔）并根据坐标上方的标题分隔内部的值。文件的格式不是我习惯的，我无法正确读取值。即使我能让他们正确阅读，我也不明白如何将它们列入清单。

以下是CSV文件的外观：

# image name
1.png
# probe locations
100 100
200 100
100 200
300 300

# another image name
2.png
100 200
200 100
300 300
135 322

# end

这是我正在玩的代码：

class CommentedFile:
    def __init__(self, f, commentstring="#"):
        self.f = f
        self.commentstring = commentstring
    def next(self):
        line = self.f.next()
        while line.startswith(self.commentstring):
            line = self.f.next()
        return line
    def __iter__(self):
        return self

#I did this in order to ignore the comments in the CSV file

tsv_file = csv.reader(CommentedFile(open("test.exp", "rb")),
                  delimiter=' ')


for row in tsv_file:
    if row != int:
        next(tsv_file)
    if row:
        print row

代码打印出来：

['100', '100']
['100', '200']
['100', '200']
['300', '300']
Traceback (most recent call last):
  File "the path", line 57, in <module>
next(tsv_file)
StopIteration

所以我试图让程序根据标题分离坐标，然后将它们放入单独的列表中。谢谢你的帮助！

Answer 1

看看pandas。它有一个DataFrame对象，可以保存您的数据并允许您以直观的方式进行操作。它还有一个read_csv函数，在处理csv文件时会消除很多麻烦。

例如：

import pandas as pd

#reads your csv file in and returns a DataFrame object as metioned above. 
df = pd.read_csv("your_csv.csv", sep=' ', names=['co_a','co_b'], header=None, skiprows=2)

#extracts your discordant to separate lists
list1 = df.co_a.to_list()
list2 = df.co_b.to_list()

您可以使用df或df.head()查看您的数据框以及如何管理您的数据。还值得一提的是df.co_a是一个Series对象，想想超级列表/字典，你可以从那里进行分析或操作。

另外，如果您向我展示评论在csv文件中的位置，我可以告诉您如何使用read_csv忽略它们。

我知道您正在寻找csv module的答案，但这是一个更高级的工具，可能会帮助您从长远来看。

希望它有所帮助！

Answer 2

你的代码实际上对我很有用。我不知道你为什么要追溯。

tmp.csv

# image name
1.png
# probe locations
100 100
200 100
100 200
300 300

# another image name
2.png
100 200
200 100
300 300
135 322

# end

tmp.py

import csv

class CommentedFile:
    def __init__(self, f, commentstring="#"):
        self.f = f
        self.commentstring = commentstring
    def next(self):
        line = self.f.next()
        while line.startswith(self.commentstring):
            line = self.f.next()
        return line
    def __iter__(self):
        return self

#I did this in order to ignore the comments in the CSV file

tsv_file = csv.reader(CommentedFile(open("tmp.csv", "rb")),
                  delimiter=' ')


for row in tsv_file:
    if row != int:
        next(tsv_file)
    if row:
        print row

Shell输出

tmp$python tmp.py 
['1.png']
['200', '100']
['300', '300']
['2.png']
['200', '100']
['135', '322']
tmp$uname -mprsv
Darwin 12.4.0 Darwin Kernel Version 12.4.0: Wed May  1 17:57:12 PDT 2013; root:xnu-2050.24.15~1/RELEASE_X86_64 x86_64 i386
tmp$python --version
Python 2.7.2

使用Python 2读取CSV文件

2 个答案: