由于某些原因,我无法在空白文本文件中写任何内容。我最后使用了file.close(),但它仍然无法正常工作。所以有人能指出我可能会出错的地方吗?
以下是完整的代码,基本上我正在做的是从文本文件中检索唯一的电子邮件地址,然后将这些唯一的电子邮件与唯一的五位数字匹配,最后写入一个新文件,用这些数字替换电子邮件
import re
import random
email_list = []
anon = {}
number_list = []
##There are 54 unique emails, so I set len(number_list) = 54 here
while len(number_list) < 54:
rand = random.randint(10000,99999)
rand = '%%' + str(rand) + '%%'
if rand not in number_list:
number_list.append(rand)
i = 0
a = open('mbox.txt','r')
for line in a:
if re.findall(r'[A-Za-z\.-]+\S@[\w\.-]+\.[\w]+',line):
email = re.findall(r'[A-Za-z\.-]+\S@[\w\.-]+\.[\w]+',line)[0]
if email not in email_list:
email_list.append(email)
anon[email] = number_list[i]
i += 1
else:
email = "NA"
b = open('mbox-anon.txt','wt', encoding='utf-8')
for line in a:
for email in anon:
try:
linereplace = line.replace(email,anon[email])
b.write(linereplace)
except:
pass
a.close()
b.close()
答案 0 :(得分:1)
假设您打算替换第一个文件内容并将它们放入第二个文件,您应该用{/ 1>替换for line in b
a.seek(0)
for line in a:
或在第一个循环之前打开b
,然后添加
b.write(line.replace(email, anon[email]))
每次迭代。
答案 1 :(得分:0)
我认为这段代码可以满足您的需求。它会读取文件mbox.txt
,从中提取所有电子邮件,并根据您的方法将每个唯一的电子邮件地址映射到5位数值。然后它将相同的数据写入mbox-anon.txt
,用每个电子邮件地址替换相应的5位数值。
import random
import re
def generate_crypto_value():
return '%%{}%%'.format(random.randint(10000, 99999))
def obscure_emails(file_in, file_out, email_masker):
with open(file_in) as f_in, open(file_out, 'w') as f_out:
data = f_in.read()
email_pattern = r'[A-Za-z\.-]+\S@[\w\.-]+\.[\w]+'
for email in set(re.findall(email_pattern, data)):
data = data.replace(email, email_masker())
f_out.write(data)
if __name__ == '__main__':
obscure_emails(
file_in='mbox.txt',
file_out='mbox-anon.txt',
email_masker=generate_crypto_value)
运行前的mbox.txt
示例
Here's one address: foo.bar@email.com
Another address: baz@hotmail.org
And the first address again: foo.bar@email.com with some text after it
运行后的mbox-anon.txt
示例
Here's one address: %%61286%%
Another address: %%51955%%
And the first address again: %%61286%% with some text after it