我在读取CSV文件时遇到此错误,是否有解决方案?
我想从CSV文件中获取电子邮件,但我一次又一次收到此错误!
这是错误:
Traceback (most recent call last):
File "email-extractor.py", line 7, in <module>
content = f.read()
MemoryError
这是我的Python代码:
import re
fileInput = 'owner-emails.csv'
fileOutput = 'email-gen-'+fileInput+'.txt'
f = open(fileInput,encoding='utf-8')
content = f.read()
# email regex
regex = re.compile(("([a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`"
"{|}~-]+)*(@|\sat\s)(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(\.|"
"\sdot\s))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)"))
# set makes them unique
results = set(regex.findall(content))
emails = ""
count = len(results)
for x in results:
emails += str(x[0])+"\n"
# function to write file
def writefile():
f = open(fileOutput, 'w')
f.write(emails)
f.close()
print("File written: " + fileOutput)
writefile()
这是我的CSV文件:
答案 0 :(得分:0)
逐行迭代文件,不要一次用file.read()
读取文件。
import re
regex = re.compile(r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)")
with open('/path/to/FILE','r') as f:
results = [ regex.findall(l) for l in f ]
results = [ r for r in results if len(r) > 0 ]
print(results)