Python IMAP scraper无限期挂起

时间:2016-01-21 18:48:48

标签: python gmail imap

我正在尝试从我有权访问的Gmail帐户中的特定文件夹中抓取数据。

我最近尝试在Windows 7上使用Python 2.7运行this code,同时登录到感兴趣的Gmail帐户。出于某种原因,虽然它似乎运行了很长时间(我离开它长达40分钟)而没有完成或提供错误。

就目前而言,我在Gmail帐户中定位的文件夹只有大约50个简单的文字电子邮件,没有附件,图片或任何可能暗示该过程应该花费的时间。在做与IMAP类似的事情之前,有没有人遇到这样的问题?

完整性代码:

#!/usr/bin/env python
#
# Very simple Python script to dump all emails in an IMAP folder to files.  
# This code is released into the public domain.
#
# RKI Nov 2013
#
import sys
import imaplib
import getpass

IMAP_SERVER = 'imap.gmail.com'
EMAIL_ACCOUNT = "notatallawhistleblowerIswear@gmail.com"
EMAIL_FOLDER = "Top Secret/PRISM Documents"
OUTPUT_DIRECTORY = 'C:/src/tmp'

PASSWORD = getpass.getpass()


def process_mailbox(M):
    """
    Dump all emails in the folder to files in output directory.
    """

    rv, data = M.search(None, "ALL")
    if rv != 'OK':
        print "No messages found!"
        return

    for num in data[0].split():
        rv, data = M.fetch(num, '(RFC822)')
        if rv != 'OK':
            print "ERROR getting message", num
            return
        print "Writing message ", num
        f = open('%s/%s.eml' %(OUTPUT_DIRECTORY, num), 'wb')
        f.write(data[0][1])
        f.close()

def main():
    M = imaplib.IMAP4_SSL(IMAP_SERVER)
    M.login(EMAIL_ACCOUNT, PASSWORD)
    rv, data = M.select(EMAIL_FOLDER)
    if rv == 'OK':
        print "Processing mailbox: ", EMAIL_FOLDER
        process_mailbox(M)
        M.close()
    else:
        print "ERROR: Unable to open mailbox ", rv
    M.logout()

if __name__ == "__main__":
    main()

1 个答案:

答案 0 :(得分:1)

代码对我来说很好。下面,我在您的代码中添加了一些调试打印(使用pprint)来查看IMAP4_SSL对象M的属性。我的Gmail使用双因素身份验证,因此我需要设置gmail app password

from pprint import pprint 

# ....

M = imaplib.IMAP4_SSL(IMAP_SERVER)
print('---- Attributes of the IMAP4_SSL connection before login ----')
pprint(vars(M))

M.login(EMAIL_ACCOUNT, PASSWORD)
print('\n \n')
print('---- Attributes of the IMAP4_SSL connection after login ----')
pprint(vars(M))

# open specific folder
rv, data = M.select(EMAIL_FOLDER)
print('\n \n')
print('---- Data returned from select of folder = {}'.format(data))
  • 检查第一个pprint(vars(M))
    1. 'welcome': '\* OK Gimap ready for requests from ...
    2. 'port': 993,
  • 检查第二个pprint(vars(M))
    1. _cmd_log成功登录:6: ('< PJIL1 OK **@gmail.com authenticated (Success)
  • data返回的
  • M.select(EMAIL_FOLDER)应该是可以下载的电子邮件数量。