我是python和编程的新手。我需要一些python脚本的帮助。有两个文件,每个文件包含电子邮件地址(超过5000行)。输入文件包含我要在数据文件中搜索的电子邮件地址(也包含电子邮件地址)。然后我想将输出打印到控制台上的文件或显示。我搜索脚本并能够修改,但我没有得到所需的结果。你能帮我吗?
dfile1 (50K lines)
yyy@aaa.com
xxx@aaa.com
zzz@aaa.com
ifile1 (10K lines)
ccc@aaa.com
vvv@aaa.com
xxx@aaa.com
zzz@aaa.com
Output file
xxx@aaa.com
zzz@aaa.com
datafile = 'C:\\Python27\\scripts\\dfile1.txt'
inputfile = 'C:\\Python27\\scripts\\ifile1.txt'
with open(inputfile, 'r') as f:
names = f.readlines()
outputlist = []
with open(datafile, 'r') as fd:
for line in fd:
name = fd.readline()
if name[1:-1] in names:
outputlist.append(line)
else:
print "Nothing found"
print outputlist
新代码
with open(inputfile, 'r') as f:
names = f.readlines()
outputlist = []
with open(datafile, 'r') as f:
for line in f:
name = f.readlines()
if name in names:
outputlist.append(line)
else:
print "Nothing found"
print outputlist
答案 0 :(得分:2)
mitan8给出了你的问题,但这就是我要做的事情:
with open(inputfile, "r") as f:
names = set(i.strip() for i in f)
output = []
with open(datafile, "r") as f:
for name in f:
if name.strip() in names:
print name
这可以避免将较大的数据文件读入内存。
如果要写入输出文件,可以对第二个with
语句执行此操作:
with open(datafile, "r") as i, open(outputfile, "w") as o:
for name in i:
if name.strip() in names:
o.write(name)
答案 1 :(得分:2)
也许我错过了一些东西,但为什么不用一对呢?
#!/usr/local/cpython-3.3/bin/python
data_filename = 'dfile1.txt'
input_filename = 'ifile1.txt'
with open(input_filename, 'r') as input_file:
input_addresses = set(email_address.rstrip() for email_address in input_file.readlines())
with open(data_filename, 'r') as data_file:
data_addresses = set(email_address.rstrip() for email_address in data_file.readlines())
print(input_addresses.intersection(data_addresses))
答案 2 :(得分:1)
我认为您可以删除name = fd.readline()
,因为您已经在for循环中获得了这一行。除了for循环之外,它还会读取另一个行,每次读取一行。另外,我认为name[1:-1]
应该是name
,因为您不想在搜索时删除第一个和最后一个字符。 with
会自动关闭已打开的文件。
PS :我是怎么做的:
with open("dfile1") as dfile, open("ifile") as ifile:
lines = "\n".join(set(dfile.read().splitlines()) & set(ifile.read().splitlines())
print(lines)
with open("ofile", "w") as ofile:
ofile.write(lines)
在上面的解决方案中,基本上我将两个文件的行的联合(两个元素的一部分)用于查找公共行。
答案 3 :(得分:1)
我认为您的问题源于以下内容:
name = fd.readline()
if name[1:-1] in names:
name[1:-1]
会对每个电子邮件地址进行切片,以便您跳过第一个和最后一个字符。虽然跳过最后一个字符(换行符'\n'
)可能会很好,但在“dfile”中加载名称数据库时
with open(inputfile, 'r') as f:
names = f.readlines()
你包括换行符。因此,不要在“ifile”中填写名称,即
if name in names:
答案 4 :(得分:1)
这就是我要做的事情:
names=[]
outputList=[]
with open(inputfile) as f:
for line in f:
names.append(line.rstrip("\n")
myEmails=set(names)
with open(outputfile) as fd, open("emails.txt", "w") as output:
for line in fd:
for name in names:
c=line.rstrip("\n")
if name in myEmails:
print name #for console
output.write(name) #for writing to file