我正在尝试读取mbox
文件的内容,并将其与从另一个文件中读取的单词列表进行比较。我相信问题是我读错了它们,因为输出与我期望知道的文件内容不符。
我尝试将它们读为rb
和r
,但没有运气。然后,我尝试将txt
文件放入list
中。无论如何,mbox
文件不能插入列表。作为进一步的测试,我尝试使用get_payload()
函数读取电子邮件的内容,但是它返回的字节对我没有用。
# Opening the file that contains the balcklisted words and printing it
with open("blacklist.txt",'r') as afile:
buf=afile.read()
print(buf)
# Opening the mbox files
mbox = mailbox.mbox('Andishe.mbox')
# To read the content of the mbox file when its a multiple messages
for message in mbox:
if message.is_multipart():
print ("from :",message['from'])
print ("to :",message['to'])
content = message.as_string()
# print(content)
else:
print ("from :",message['from'])
print ("to :",message['to'])
content = message.as_string()
# print(content)
# To check and see if the black listed words are inside the content of the email
for file in content:
if file in buf:
print("file contains blacklisted words" + file)
else:
print("file does not contain blacklisted words")
我希望结果是这样的:
some black listed word
file contains blacklisted words + the black listed word
但是我陷入了不断打印的循环中,以下是打印内容的一部分:
file contains blacklisted wordsr
file contains blacklisted wordso
file contains blacklisted wordsm
file contains blacklisted words
我不知道这些r
,o
,m
代表什么或来自何处?
答案 0 :(得分:0)
我弄清楚了我要去哪里错了
1-我读错了txt文件的内容。我应该使用这个:
blacklist=[]
for line in afile:
blacklist.append(line.strip('\n'))
这样,我摆脱了租船合同的结尾,并且将每一行保持为一个单词
2-我也没有在for循环中做错,因为我没有附加mbox文件的内容。这解决了问题:
content_string = ''.join(content)
content_string = content_string.lower()
for word in blacklist:
if word.lower() in content_string:
print("This black listed word exists in content : ",word)