Retrieve all emails from Gmail i did but only got 3000 email not all

时间:2018-08-22 13:54:26

标签: python google-api gmail-api google-api-python-client

What is the way to pull out all emails from Gmail?

I did full_sync, but that didn't return all of my email - only about 3000 emails, while I know I have more. In the documentation they did not mention about this.

My code snippet:

    history = service.users().history().list(
        userId='me',
        startHistoryId=start_history_id,
        maxResults=500,
        labelId='INBOX'
    ).execute()
    if "history" in history:
        try:
            for message in history["history"]:
                   batch.add(
                    service.users().messages().get(userId='me', id=message["messages"][0]["id"]),
                    callback="somecallbak",
                    request_id=request_id
                )
            batch.execute()
    while 'nextPageToken' in history:

2 个答案:

答案 0 :(得分:3)

如果要进行完全同步,则应参考this文档,该文档建议执行两个步骤:

因此您不需要使用users.history.list,因为您将很难找到startHistoryId来开始。

您可以通过以下类似的方式(在我的python 3.x控制台上测试并运行)来实现此目的。正如其他人所建议的那样,我使用了python客户端paginationbatch请求功能。

from httplib2 import Http
from googleapiclient.discovery import build
from oauth2client import client, tools, file


# callback for the batch request (see below)
def print_gmail_message(request_id, response, exception):
    if exception is not None:
        print('messages.get failed for message id {}: {}'.format(request_id, exception))
    else:
        print(response)


# Scopes
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly', ]

# where do we store our credentials?
creds_store = file.Storage('gmail-list.json')
start_creds = creds_store.get()

# standard oauth2 authentication flow
if not start_creds or start_creds.invalid:
    # client_id.json is exported from your gcp project
    start_flow = client.flow_from_clientsecrets('client_id.json', SCOPES)
    start_creds = tools.run_flow(start_flow, creds_store)

# Gmail SDK
http = Http()
gmail_sdk = build('gmail', 'v1', http=start_creds.authorize(http))

# messages.list parameters
msg_list_params = {
    'userId': 'me'
}
# messages.list API
message_list_api = gmail_sdk.users().messages()
# first request
message_list_req = message_list_api.list(**msg_list_params)

while message_list_req is not None:
    gmail_msg_list = message_list_req.execute()

    # we build the batch request
    batch = gmail_sdk.new_batch_http_request(callback=print_gmail_message)
    for gmail_message in gmail_msg_list['messages']:
        msg_get_params = {
            'userId': 'me',
            'id': gmail_message['id'],
            'format': 'full',
        }
        batch.add(gmail_sdk.users().messages().get(**msg_get_params), request_id=gmail_message['id'])

    batch.execute(http=http)

    # pagination handling
    message_list_req = message_list_api.list_next(message_list_req, gmail_msg_list)

答案 1 :(得分:1)

按照此link中的建议,您可以使用batch requests

  

使用批处理并一次请求100条消息。您将需要发出1000个请求,但好消息是,这很好,而且每个人都将更加轻松(一次请求中无需下载1GB响应!)。

也基于此thread您可以在每个请求上保存下一个页面令牌,并在下一个请求中使用它。如果响应中没有下一个页面令牌,则说明您已收到所有消息。