Gmail API仅重现1Mb数据

时间:2019-06-25 00:48:53

标签: python gmail-api

我已经将所有我想请求的邮件过滤到Gmail的标签中,并且通过在他们的quickstart.py脚本中使用以下代码成功地将邮件退回了

# My Code
results = service.users().messages().list(userId='me',labelIds = '{Label_id}', maxResults='10000000').execute()
messages = results.get('messages', [])

for message in messages:
    msg = service.users().messages().get(userId='me', id=message['id'], format='metadata', metadataHeaders=['subject']).execute()
    print(msg['snippet'].encode('utf-8').strip())

我首先在一个较早的请求中列出了所有标签及其ID,然后将其替换为{Label_id}。然后,我只要求主题元数据字段。问题在于响应仅返回准确的1 Mb数据。我知道这一点是因为我将输出重定向到文件中并执行ls -latr --block-size=MB。此外,我可以看到该标签中的消息(旧消息)比基于日期返回的消息还多。该请求始终在完全相同的消息处停止。他们都没有附件。

应该允许我使用他们的API参考

Daily Usage 1,000,000,000 quota units per day

Per User Rate Limit 250 quota units per user per second

我不认为这是我要达到的目标,但也许我错了,因为每条消息都有1-3封回复,我可以看到它的来信,也许每封邮件计为5个配额单位?不确定。我尝试过使用maxResults参数,但这似乎并没有改变任何东西。

我是在这里设置上限吗,还是在我的请求逻辑中?

编辑1

from __future__ import print_function
import pickle
import os.path
import base64
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

## If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://mail.google.com/']

def main():
    """Shows basic usage of the Gmail API.
    Lists the user's Gmail labels.
    """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server()
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('gmail', 'v1', credentials=creds)

    messageArray = []
    pageToken = None
    while True:
        results = service.users().messages().list(userId='me',labelIds = '{Label_ID}', maxResults=500, pageToken=pageToken).execute()
        messages = results.get('messages', [])
        for message in messages:
            msg = service.users().messages().get(userId='me', id=message['id'], format='metadata', metadataHeaders=['subject']).execute()
            messageArray.append(msg)
        pageToken = results.get('nextPageToken', None)
        if not pageToken:
            print('[%s]' % ', '.join(map(str, messageArray)))
            break


if __name__ == '__main__':
    main()

编辑2

这是我使用的最后一个脚本。这会吐出一种更好,更干净的格式,我将其重定向到文件并且易于解析。

from __future__ import print_function
import pickle
import os.path
import base64
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

## If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://mail.google.com/']

def main():
    """Shows basic usage of the Gmail API.
    Lists the user's Gmail labels.
    """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server()
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('gmail', 'v1', credentials=creds)

    pageToken = None
    while True:
        results = service.users().messages().list(userId='me',labelIds = '{Label_ID}', maxResults=500, pageToken=pageToken).execute()
        messages = results.get('messages', [])
        for message in messages:
            msg = service.users().messages().get(userId='me', id=message['id'], format='metadata', metadataHeaders=['subject']).execute()
            print(msg['snippet'].encode('utf-8').strip())
        pageToken = results.get('nextPageToken', None)
        if not pageToken:
            break


if __name__ == '__main__':
    main()

1 个答案:

答案 0 :(得分:1)

maxResults的最大值是500。如果将其设置得更高,结果中仍然只会收到500条消息。您可以通过messages的长支票确认这一点。

您需要实现pagination

messages = []
pageToken = None
while True:
  results = service.users().messages().list(userId='me',labelIds = '{Label_id}', maxResults=500, pageToken=pageToken).execute()
  messages.append(results.get(messages, []))
  pageToken = results.get('nextPageToken', None)
  if not pageToken:
    break

如果您只想要未解析的原始电子邮件,请尝试使用

# at top of file
from base64 import urlsafe_b64decode

msg = service.users().messages().get(userId='me', id=message['id'], format='raw').execute()
print(urlsafe_b64decode(msg['raw']))