Question

我需要在制表符分隔的文本文件中搜索某些内容。用户应该输入文件和需要搜索的内容。然后程序应该返回用户输入单词所在的整行。到目前为止我有两个模型因为我从不同的角度来看这个问题。第一个程序如下：

import csv

searchfile = raw_input ('Which file do you want to search?   ')
try:
    input_file = open (searchfile, 'rU')
except:
    print "Invalid file. Please enter a correct file"

csv_file_object = csv.reader(open(searchfile, 'rb')) 
header = csv_file_object.next()   

data=[]                          
for row in csv_file_object:      
    data.append(row)             

searchA = raw_input ('which author?')

author_search = data[0::,0] == searchA

if author_search in searchfile:
    print author_search

第一个程序的问题是弹出此错误：

TypeError：列表索引必须是整数，而不是元组

因此，我尝试了这种方法：

import csv

searchfile = raw_input ('Which file do you want to search?   ')
try:
    input_file = open (searchfile, 'rU')
except:
    print "Invalid file. Please enter a correct file"


with open(searchfile) as f:
    reader = csv.reader(f, delimiter="\t")
    d = list(reader)

searchtype = raw_input ('Search on author or journal/conference or [Q = quit]')


if searchtype == 'author':
    searchdataA = raw_input ("Input author name")
    if searchdataA in input_file:
        print line

elif searchtype == 'journal' or 'conference' or 'journal/conference':
    searchdataJ = raw_input ("input journal/conference name")
    if searchdataJ in d:
        print line

elif searchtype == 'Q':
    print "Program left"

else:
    print "please choose either author or journal/conference"

这无法超越输入搜索参数。

任何关于在何处使用这两个程序的帮助都会非常感激，或者如果我完全在错误的轨道上，那么链接到有用的材料会很棒。

Answer 1

我认为你让它变得比它需要的要复杂得多。由于您要打印目标词出现的整行，因此您并不真正需要CSV模块。你没有做任何能够完成的复杂解析。

searchfile = raw_input ('Which file do you want to search?   ')
searchA = raw_input ('which author?')

with open(searchfile) as infile:
    for line in infile:
        if searchA in line:
            print('  '.join(line.split()))
            break # remove this if you want to print all matches instead of
                  # just the first one

请注意，在打印行时，我首先拆分行（默认情况下在空白处拆分），然后重新加入字段，它们之间有两个空格。我认为做这样的事情是一个很好的方法，因为你在控制台上打印了与标签分隔的字段。减少额外的空间将使您的打印更容易阅读，但使用两个空格仍然可以轻松区分列。

您可以通过提示您的用户任何搜索字词来概括它，而不是指定＆＃34; author＆＃34;。这可能是要走的路，因为您的第二个代码段表明您可能想要搜索其他字段，例如＆＃34; journal＆＃34;或＆＃34;会议＆＃34;：

target_term = raw_input("Which term or phrase would you like to find?")

由于此方法搜索并打印整行，因此无需处理单独的列和不同类型的搜索项。它只是一次查看整行并打印出匹配的行。

Answer 2

为什么不简单

fname = raw_input("Enter Filename")
author = raw_input("Enter Author Name:")
if author in open(fname,"rb").read():
   print "match found"

如果你想看到你能做的线

print re.findall(".*%s.*"%(author),open(fname,"rb").read())

正如人们所指出的那样，这是更好的形式

with open(fname,"rb") as f:
     data = print re.findall(".*%s.*"%(author),f.read())

虽然在CPython中它会立即被垃圾收集所以它不是真正的问题....

Answer 3

由于您实际上并未使用其他搜索方法，具体取决于您是在搜索作者，期刊，会议或期刊/会议。所以你实际上可以在线上进行全文搜索。因此，明智的做法是从用户BEFORE处理文件中收集所需的所有数据，这样您就可以只输出匹配的行。如果用户传递了一个相当大的CSV文件，那么你的方式会占用太多的内存。

with open(searchfile, 'r') as f:
    for line in f:
        if line.find(searchA) > -1:
            print line

这样您就可以尽快循环浏览文件并打印出所有匹配的行。

.find()函数将索引返回到找到匹配项的字符串中的位置，否则返回-1（如果找不到该字符串）。因此，从价值中你可以估计＆＃34;在匹配的位置，但如果你真的想区分作者，期刊等，那么你将不得不分割线。在我的示例中，我将假设作者字段是CSV行中的第六个字段：

with open(searchfile, 'r') as f:
    for line in f:
        fields = line.split("\t")
        if len(fields) > 5:                    # check length of fields array
            if fields[5].find(searchA) > -1:   # search straight in author field
                print line                     # return full line

Answer 4

我想到的第一件事就是：

def check_file(file_name, author_name):
    with open(file_name) as f:
        content = f.readlines()
    for line in content:
        if author_name in line:
            print "Found: ", line

希望它有用。

如何在制表符分隔文件中搜索文本并打印此信息？

4 个答案: