Python使用来自另一个文件的输入搜索文件以查找文本

时间:2013-11-12 15:54:55

标签: python python-2.7

我是python和编程的新手。我需要一些python脚本的帮助。有两个文件,每个文件包含电子邮件地址(超过5000行)。输入文件包含我要在数据文件中搜索的电子邮件地址(也包含电子邮件地址)。然后我想将输出打印到控制台上的文件或显示。我搜索脚本并能够修改,但我没有得到所需的结果。你能帮我吗?

dfile1 (50K lines)
yyy@aaa.com
xxx@aaa.com
zzz@aaa.com


ifile1 (10K lines)
ccc@aaa.com
vvv@aaa.com
xxx@aaa.com
zzz@aaa.com

Output file
xxx@aaa.com
zzz@aaa.com



datafile = 'C:\\Python27\\scripts\\dfile1.txt'
inputfile = 'C:\\Python27\\scripts\\ifile1.txt'

with open(inputfile, 'r') as f:
names = f.readlines()

outputlist = []

with open(datafile, 'r') as fd:
  for line in fd:
    name = fd.readline()
    if name[1:-1] in names:
        outputlist.append(line)
    else:
        print "Nothing found"
 print outputlist

新代码

with open(inputfile, 'r') as f:
    names = f.readlines()
outputlist = []

with open(datafile, 'r') as f:
    for line in f:
        name = f.readlines()
        if name in names:
            outputlist.append(line)
        else:
            print "Nothing found"
    print outputlist

5 个答案:

答案 0 :(得分:2)

mitan8给出了你的问题,但这就是我要做的事情:

with open(inputfile, "r") as f:
    names = set(i.strip() for i in f)

output = []

with open(datafile, "r") as f:
    for name in f:
        if name.strip() in names:
            print name

这可以避免将较大的数据文件读入内存。

如果要写入输出文件,可以对第二个with语句执行此操作:

with open(datafile, "r") as i, open(outputfile, "w") as o:
    for name in i:
        if name.strip() in names:
            o.write(name)

答案 1 :(得分:2)

也许我错过了一些东西,但为什么不用一对呢?

#!/usr/local/cpython-3.3/bin/python

data_filename = 'dfile1.txt'
input_filename = 'ifile1.txt'

with open(input_filename, 'r') as input_file:
    input_addresses = set(email_address.rstrip() for email_address in input_file.readlines())

with open(data_filename, 'r') as data_file:
    data_addresses = set(email_address.rstrip() for email_address in data_file.readlines())

print(input_addresses.intersection(data_addresses))

答案 2 :(得分:1)

我认为您可以删除name = fd.readline(),因为您已经在for循环中获得了这一行。除了for循环之外,它还会读取另一个行,每次读取一行。另外,我认为name[1:-1]应该是name,因为您不想在搜索时删除第一个和最后一个字符。 with会自动关闭已打开的文件。

PS :我是怎么做的:

with open("dfile1") as dfile, open("ifile") as ifile:
    lines = "\n".join(set(dfile.read().splitlines()) & set(ifile.read().splitlines())
print(lines)
with open("ofile", "w") as ofile:
    ofile.write(lines)

在上面的解决方案中,基本上我将两个文件的行的联合(两个元素的一部分)用于查找公共行。

答案 3 :(得分:1)

我认为您的问题源于以下内容:

name = fd.readline()
if name[1:-1] in names:

name[1:-1]会对每个电子邮件地址进行切片,以便您跳过第一个和最后一个字符。虽然跳过最后一个字符(换行符'\n')可能会很好,但在“dfile”中加载名称数据库时

with open(inputfile, 'r') as f:
    names = f.readlines()

你包括换行符。因此,不要在“ifile”中填写名称,即

if name in names:

答案 4 :(得分:1)

这就是我要做的事情:

names=[]
outputList=[]
with open(inputfile) as f:
    for line in f:
        names.append(line.rstrip("\n")

myEmails=set(names)

with open(outputfile) as fd, open("emails.txt", "w") as output:
    for line in fd:
        for name in names:
            c=line.rstrip("\n")
            if name in myEmails:
                print name #for console
                output.write(name) #for writing to file