使用python从gmail下载csv文件

时间:2017-01-19 18:35:25

标签: python csv gmail

我尝试了不同的python脚本,以便从Gmail下载CSV附件。但我无法得到它。这是可能的。如果可以使用哪个python脚本?谢谢。

4 个答案:

答案 0 :(得分:1)

TL; DR

  • 如果您想跳过此答案中的所有详细信息,我整理了一个Github存储库,该存储库使从gmail获取CSV数据的操作非常简单:

    from gmail import *
    service = get_gmail_service()
    
    # get all attachments from e-mails containing 'test'
    search_query = "test"
    service = get_gmail_service()
    csv_dfs = query_for_csv_attachments(service, search_query)
    print(csv_dfs)
    
  • 这是仓库:https://github.com/robertdavidwest/google_api

  • 只需按照README中的说明进行操作,并尽享乐趣,请随时贡献力量!

漫长的答案-直接使用google-api-python-clientoauth2client

  • 点击此链接,然后单击按钮:“启用GMAIL API”

    https://developers.google.com/gmail/api/quickstart/python

    设置完成后,您将下载一个名为credentials.json

  • 的文件
  • 安装所需的python软件包

    pip install --upgrade google-api-python-client oauth2client
    
  • 以下代码段将允许您通过python连接到Gmail帐户

    from googleapiclient.discovery import build
    from httplib2 import Http
    from oauth2client import file, client, tools
    
    GMAIL_CREDENTIALS_PATH = 'credentials.json' # downloaded
    GMAIL_TOKEN_PATH = 'token.json' # this will be created
    
    store = file.Storage(GMAIL_TOKEN_PATH)
    creds = store.get()
    if not creds or creds.invalid:
        flow = client.flow_from_clientsecrets(GMAIL_CREDENTIALS_PATH, SCOPES)
        creds = tools.run_flow(flow, store)
    service = build('gmail', 'v1', http=creds.authorize(Http()))
    
  • 现在,通过此服务,您可以阅读电子邮件并阅读电子邮件中可能包含的任何附件

  • 首先,您可以使用搜索字符串查询电子邮件,以找到需要附件的电子邮件ID:

    search_query = "ABCD"
    result = service.users().messages().list(userId='me', q=search_query).execute()
    msgs = results['messages')
    msg_ids = [msg['id'] for msg in msgs]
    
  • 现在对于每个messageId,您都可以在电子邮件中找到关联的附件。

  • 这部分有点混乱,请多多包涵。首先我们得到一个清单 电子邮件中的“附件部分”(和附件文件名)。 这些是包含附件的电子邮件的组成部分:

    messageId = 'XYZ'
    msg = service.messages().get(userId='me', id=messageId).execute()
    parts = msg.get('payload').get('parts')
    all_parts = []
    for p in parts:
        if p.get('parts'):
            all_parts.extend(p.get('parts'))
        else:
            all_parts.append(p)
    
    att_parts = [p for p in all_parts if p['mimeType']=='text/csv']
    filenames = [p['filename'] for p in att_parts]
    
  • 现在我们可以从每个部分获取附加的csv:

    messageId = 'XYZ'
    data = part['body'].get('data')
    attachmentId = part['body'].get('attachmentId')
    if not data:
        att = service.users().messages().attachments().get(
                userId='me', id=attachmentId, messageId=messageId).execute()
        data = att['data']
    
  • 现在您有了csv数据,但其格式为编码格式,所以最后我们更改编码并将结果转换为pandas数据帧

    import base64
    import pandas as pd
    from StringIO import StringIO
    str_csv  = base64.urlsafe_b64decode(data.encode('UTF-8'))
    df = pd.read_csv(StringIO(str_csv))
    
  • 就这样!您有一个带有csv附件内容的pandas数据框。您可以使用此数据框。或者,如果您只想下载csv,则可以使用pd.DataFrame.to_csv将其写入磁盘。如果您想保留文件名,可以使用我们先前获得的filenames列表

答案 1 :(得分:1)

Download attachment from mail using Python上提供了最新答案

import os
from imbox import Imbox # pip install imbox
import traceback

# enable less secure apps on your google account
# https://myaccount.google.com/lesssecureapps

host = "imap.gmail.com"
username = "username"
password = 'password'
download_folder = "/path/to/download/folder"

if not os.path.isdir(download_folder):
    os.makedirs(download_folder, exist_ok=True)

mail = Imbox(host, username=username, password=password, ssl=True, ssl_context=None, starttls=False)
messages = mail.messages() # defaults to inbox

for (uid, message) in messages:
    mail.mark_seen(uid) # optional, mark message as read

    for idx, attachment in enumerate(message.attachments):
        try:
            att_fn = attachment.get('filename')
            download_path = f"{download_folder}/{att_fn}"
            print(download_path)
            with open(download_path, "wb") as fp:
                fp.write(attachment.get('content').read())
        except:
            pass
            print(traceback.print_exc())

mail.logout()


"""
Available Message filters: 

# Gets all messages from the inbox
messages = mail.messages()

# Unread messages
messages = mail.messages(unread=True)

# Flagged messages
messages = mail.messages(flagged=True)

# Un-flagged messages
messages = mail.messages(unflagged=True)

# Flagged messages
messages = mail.messages(flagged=True)

# Un-flagged messages
messages = mail.messages(unflagged=True)

# Messages sent FROM
messages = mail.messages(sent_from='sender@example.org')

# Messages sent TO
messages = mail.messages(sent_to='receiver@example.org')

# Messages received before specific date
messages = mail.messages(date__lt=datetime.date(2018, 7, 31))

# Messages received after specific date
messages = mail.messages(date__gt=datetime.date(2018, 7, 30))

# Messages received on a specific date
messages = mail.messages(date__on=datetime.date(2018, 7, 30))

# Messages whose subjects contain a string
messages = mail.messages(subject='Christmas')

# Messages from a specific folder
messages = mail.messages(folder='Social')
"""

答案 2 :(得分:0)

我明白了。这不是我自己的工作。我得到了一些代码,将它们组合在一起并修改为此代码。然而,最后,它奏效了。

print 'Proceeding'

import email
import getpass
import imaplib
import os
import sys

userName = 'yourgmail@gmail.com'
passwd = 'yourpassword'
directory = '/full/path/to/the/directory'


detach_dir = '.'
if 'DataFiles' not in os.listdir(detach_dir):
    os.mkdir('DataFiles')



try:
    imapSession = imaplib.IMAP4_SSL('imap.gmail.com')
    typ, accountDetails = imapSession.login(userName, passwd)
    if typ != 'OK':
        print 'Not able to sign in!'
        raise

    imapSession.select('[Gmail]/All Mail')
    typ, data = imapSession.search(None, 'ALL')
    if typ != 'OK':
        print 'Error searching Inbox.'
        raise


    for msgId in data[0].split():
        typ, messageParts = imapSession.fetch(msgId, '(RFC822)')
        if typ != 'OK':
            print 'Error fetching mail.'
            raise

        emailBody = messageParts[0][1]
        mail = email.message_from_string(emailBody)
        for part in mail.walk():
            if part.get_content_maintype() == 'multipart':
                continue
            if part.get('Content-Disposition') is None:
                continue
            fileName = part.get_filename()

            if bool(fileName):
                filePath = os.path.join(detach_dir, 'DataFiles', fileName)
                if not os.path.isfile(filePath) :
                    print fileName
                    fp = open(filePath, 'wb')
                    fp.write(part.get_payload(decode=True))
                    fp.close()
    imapSession.close()
    imapSession.logout()

    print 'Done'


except :
    print 'Not able to download all attachments.'

答案 3 :(得分:0)

from imap_tools import MailBox

# get all .csv attachments from INBOX and save them to files
with MailBox('imap.my.ru').login('acc', 'pwd', 'INBOX') as mailbox:
    for msg in mailbox.fetch():
        for att in msg.attachments:
            if att.filename.lower().endswith('.csv'):
                with open('C:/1/{}'.format(att.filename), 'wb') as f:
                    f.write(att.payload)

https://github.com/ikvk/imap_tools