Question

我正在尝试读取mbox文件的内容，并将其与从另一个文件中读取的单词列表进行比较。我相信问题是我读错了它们，因为输出与我期望知道的文件内容不符。

我尝试将它们读为rb和r，但没有运气。然后，我尝试将txt文件放入list中。无论如何，mbox文件不能插入列表。作为进一步的测试，我尝试使用get_payload()函数读取电子邮件的内容，但是它返回的字节对我没有用。

# Opening the file that contains the balcklisted words and printing it 
with open("blacklist.txt",'r') as afile:
    buf=afile.read()
    print(buf)

# Opening the mbox files
mbox = mailbox.mbox('Andishe.mbox')

# To read the content of the mbox file when its a multiple messages
for message in mbox:
    if message.is_multipart():
        print ("from   :",message['from'])
        print ("to   :",message['to'])
        content = message.as_string()
        # print(content)
    else:
        print ("from   :",message['from'])
        print ("to   :",message['to'])
        content = message.as_string()
        # print(content)


# To check and see if the black listed words are inside the content of the email 
for file in content:
    if file in buf:
        print("file contains blacklisted words" + file)
    else:
        print("file does not contain blacklisted words")

我希望结果是这样的：

some black listed word
file contains blacklisted words + the black listed word

但是我陷入了不断打印的循环中，以下是打印内容的一部分：

file contains blacklisted wordsr
file contains blacklisted wordso
file contains blacklisted wordsm
file contains blacklisted words

我不知道这些r，o，m代表什么或来自何处？

Answer 1

我弄清楚了我要去哪里错了

1-我读错了txt文件的内容。我应该使用这个：

    blacklist=[]
    for line in afile:
        blacklist.append(line.strip('\n'))

这样，我摆脱了租船合同的结尾，并且将每一行保持为一个单词

2-我也没有在for循环中做错，因为我没有附加mbox文件的内容。这解决了问题：

content_string = ''.join(content)
content_string = content_string.lower()
for word in blacklist:
    if word.lower() in content_string:
        print("This black listed word exists in content         : ",word)

无法检查电子邮件的内容

1 个答案: