Question

我是python和编程的新手。我需要一些python脚本的帮助。有两个文件，每个文件包含电子邮件地址（超过5000行）。输入文件包含我要在数据文件中搜索的电子邮件地址（也包含电子邮件地址）。然后我想将输出打印到控制台上的文件或显示。我搜索脚本并能够修改，但我没有得到所需的结果。你能帮我吗？

dfile1 (50K lines)
yyy@aaa.com
xxx@aaa.com
zzz@aaa.com


ifile1 (10K lines)
ccc@aaa.com
vvv@aaa.com
xxx@aaa.com
zzz@aaa.com

Output file
xxx@aaa.com
zzz@aaa.com



datafile = 'C:\\Python27\\scripts\\dfile1.txt'
inputfile = 'C:\\Python27\\scripts\\ifile1.txt'

with open(inputfile, 'r') as f:
names = f.readlines()

outputlist = []

with open(datafile, 'r') as fd:
  for line in fd:
    name = fd.readline()
    if name[1:-1] in names:
        outputlist.append(line)
    else:
        print "Nothing found"
 print outputlist

新代码

with open(inputfile, 'r') as f:
    names = f.readlines()
outputlist = []

with open(datafile, 'r') as f:
    for line in f:
        name = f.readlines()
        if name in names:
            outputlist.append(line)
        else:
            print "Nothing found"
    print outputlist

Answer 1

mitan8给出了你的问题，但这就是我要做的事情：

with open(inputfile, "r") as f:
    names = set(i.strip() for i in f)

output = []

with open(datafile, "r") as f:
    for name in f:
        if name.strip() in names:
            print name

这可以避免将较大的数据文件读入内存。

如果要写入输出文件，可以对第二个with语句执行此操作：

with open(datafile, "r") as i, open(outputfile, "w") as o:
    for name in i:
        if name.strip() in names:
            o.write(name)

Answer 2

也许我错过了一些东西，但为什么不用一对呢？

#!/usr/local/cpython-3.3/bin/python

data_filename = 'dfile1.txt'
input_filename = 'ifile1.txt'

with open(input_filename, 'r') as input_file:
    input_addresses = set(email_address.rstrip() for email_address in input_file.readlines())

with open(data_filename, 'r') as data_file:
    data_addresses = set(email_address.rstrip() for email_address in data_file.readlines())

print(input_addresses.intersection(data_addresses))

Answer 3

我认为您可以删除name = fd.readline()，因为您已经在for循环中获得了这一行。除了for循环之外，它还会读取另一个行，每次读取一行。另外，我认为name[1:-1]应该是name，因为您不想在搜索时删除第一个和最后一个字符。 with会自动关闭已打开的文件。

PS ：我是怎么做的：

with open("dfile1") as dfile, open("ifile") as ifile:
    lines = "\n".join(set(dfile.read().splitlines()) & set(ifile.read().splitlines())
print(lines)
with open("ofile", "w") as ofile:
    ofile.write(lines)

在上面的解决方案中，基本上我将两个文件的行的联合（两个元素的一部分）用于查找公共行。

Answer 4

我认为您的问题源于以下内容：

name = fd.readline()
if name[1:-1] in names:

name[1:-1]会对每个电子邮件地址进行切片，以便您跳过第一个和最后一个字符。虽然跳过最后一个字符（换行符'\n'）可能会很好，但在“dfile”中加载名称数据库时

with open(inputfile, 'r') as f:
    names = f.readlines()

你包括换行符。因此，不要在“ifile”中填写名称，即

if name in names:

Answer 5

这就是我要做的事情：

names=[]
outputList=[]
with open(inputfile) as f:
    for line in f:
        names.append(line.rstrip("\n")

myEmails=set(names)

with open(outputfile) as fd, open("emails.txt", "w") as output:
    for line in fd:
        for name in names:
            c=line.rstrip("\n")
            if name in myEmails:
                print name #for console
                output.write(name) #for writing to file

Python使用来自另一个文件的输入搜索文件以查找文本

5 个答案: