迭代生成器时的IndexError

时间:2015-11-04 19:30:30

标签: python generator

我正在尝试为我的编程类解决问题。我收到一个包含电子邮件和特殊文件的文件夹。特殊文件始终以“!”开头。我应该在Corpus类中添加一个方法emails()。该方法应该是一个发电机。这是它的使用示例:

corpus = Corpus('/path/to/directory/with/emails')
count = 0
# Go through all emails and print the filename and the message body
for fname, body in corpus.emails():
    print(fname)
    print(body)
    print('-------------------------')
    count += 1
print('Finished: ', count, 'files processed.')

这是我写的课程和方法:

class Corpus:
    def __init__(self, path_to_mails_directory):
        self.path_to_mails_directory = path_to_mails_directory

    def emails(self):
    iterator = 0
    mail_body = None
    mails_folder = os.listdir(self.path_to_mails_directory)
    lenght = len(mails_folder)
    while iterator <= lenght:
        if not mails_folder[iterator].startswith("!"):
            with open(self.path_to_mails_directory+"/"+mails_folder[iterator]) as an_e_mail:
                mail_body = an_e_mail.read()
            yield mails_folder[iterator], mail_body
        iterator += 1

我尝试以这种方式运行示例代码:

if __name__ == "__main__":
    my_corpus = Corpus("data/1")
    my_gen = my_corpus.emails()
    count = 0
    for fname, body in my_gen:
        print(fname)
        print(body)
        print("------------------------------")
        count += 1
    print("finished: " + str(count))

Python会按预期打印相当多的邮件(该文件夹包含大约一千个文件),然后继续:

Traceback (most recent call last):
  File "C:/Users/tvavr/PycharmProjects/spamfilter/corpus.py", line 26, in <module>
    for fname, body in my_gen:
  File "C:/Users/tvavr/PycharmProjects/spamfilter/corpus.py", line 15, in emails
    if not mails_folder[iterator].startswith("!"):
IndexError: list index out of range

我不知道问题是什么,并希望得到任何帮助。 THX

编辑:我根据你的建议更新了一些代码。

1 个答案:

答案 0 :(得分:0)

这样做的好方法如下:

def emails(self):
    mail_body = None
    mails_folder = os.listdir(self.path_to_mails_directory)
    for mail in mails_folder:
        if mail.startswith("!"):
            pass
        else:
            with open(self.path_to_mails_directory+"/"+mail) as an_e_mail:
                mail_body = an_e_mail.read()
            yield mail, mail_body

基于索引的迭代不被认为是Pythonic。您应该更喜欢“for mail in mails_folder:”语法。