Question

我有一个以制表符分隔的文本文件，我正在尝试找出如何搜索此文件中特定列的值。

我想我需要使用csv导入但到目前为止还没有成功。有人能指出我正确的方向吗？

谢谢！

的的 ** **更新感谢大家的更新。我知道我可能会使用awk，但仅仅是为了练习，我试图在python中完成它。

我现在收到以下错误：如果row.split（''）[int（searchcolumn）] == searchquery： IndexError：列表索引超出范围

以下是我的代码片段：

#open the directory and find all the files
for subdir, dirs, files in os.walk(rootdir):
    for file in files:
        f=open(file, 'r')
        lines=f.readlines()
        for line in lines:
            #the first 4 lines of the file are crap, skip them
            if linescounter > startfromline:
                with open(file) as infile:
                    for row in infile:
                        if row.split(' ')[int(searchcolumn)] == searchquery:
                            rfile = open(resultsfile, 'a')
                            rfile.writelines(line) 
                            rfile.write("\r\n")
                            print "Writing line -> " + line
                            resultscounter += 1
        linescounter += 1
        f.close()

我将searchcolumn和searchquery都作为raw_input从用户那里获取。我猜测我现在让列表超出范围的原因是因为它没有正确解析文件？

再次感谢。

Answer 1

您也可以使用嗅探器（取自http://docs.python.org/library/csv.html）

的示例

csvfile = open("example.csv", "rb")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)

Answer 2

是的，您需要使用csv模块，并且您需要将分隔符设置为'\ t'：

spamReader = csv.reader(open('spam.csv', 'rb'), delimiter='\t')

之后你应该能够迭代：

for row in spamReader:
   print row[n]

Answer 3

这会在第四个制表符分隔的列中打印filename中带有'myvalue'的所有行：

with open(filename) as infile:
    for row in infile:
        if row.split('\t')[3] == 'myvalue':
            print row

视情况更换3，'myvalue'和print。

使用Python在特定列中搜索特定值

3 个答案: