What is the way to pull out all emails from Gmail?
I did full_sync
, but that didn't return all of my email - only about 3000 emails, while I know I have more. In the documentation they did not mention about this.
My code snippet:
history = service.users().history().list(
userId='me',
startHistoryId=start_history_id,
maxResults=500,
labelId='INBOX'
).execute()
if "history" in history:
try:
for message in history["history"]:
batch.add(
service.users().messages().get(userId='me', id=message["messages"][0]["id"]),
callback="somecallbak",
request_id=request_id
)
batch.execute()
while 'nextPageToken' in history:
答案 0 :(得分:3)
如果要进行完全同步,则应参考this文档,该文档建议执行两个步骤:
因此您不需要使用users.history.list,因为您将很难找到startHistoryId
来开始。
您可以通过以下类似的方式(在我的python 3.x控制台上测试并运行)来实现此目的。正如其他人所建议的那样,我使用了python客户端pagination和batch请求功能。
from httplib2 import Http
from googleapiclient.discovery import build
from oauth2client import client, tools, file
# callback for the batch request (see below)
def print_gmail_message(request_id, response, exception):
if exception is not None:
print('messages.get failed for message id {}: {}'.format(request_id, exception))
else:
print(response)
# Scopes
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly', ]
# where do we store our credentials?
creds_store = file.Storage('gmail-list.json')
start_creds = creds_store.get()
# standard oauth2 authentication flow
if not start_creds or start_creds.invalid:
# client_id.json is exported from your gcp project
start_flow = client.flow_from_clientsecrets('client_id.json', SCOPES)
start_creds = tools.run_flow(start_flow, creds_store)
# Gmail SDK
http = Http()
gmail_sdk = build('gmail', 'v1', http=start_creds.authorize(http))
# messages.list parameters
msg_list_params = {
'userId': 'me'
}
# messages.list API
message_list_api = gmail_sdk.users().messages()
# first request
message_list_req = message_list_api.list(**msg_list_params)
while message_list_req is not None:
gmail_msg_list = message_list_req.execute()
# we build the batch request
batch = gmail_sdk.new_batch_http_request(callback=print_gmail_message)
for gmail_message in gmail_msg_list['messages']:
msg_get_params = {
'userId': 'me',
'id': gmail_message['id'],
'format': 'full',
}
batch.add(gmail_sdk.users().messages().get(**msg_get_params), request_id=gmail_message['id'])
batch.execute(http=http)
# pagination handling
message_list_req = message_list_api.list_next(message_list_req, gmail_msg_list)
答案 1 :(得分:1)
按照此link中的建议,您可以使用batch requests。
使用批处理并一次请求100条消息。您将需要发出1000个请求,但好消息是,这很好,而且每个人都将更加轻松(一次请求中无需下载1GB响应!)。
也基于此thread,您可以在每个请求上保存下一个页面令牌,并在下一个请求中使用它。如果响应中没有下一个页面令牌,则说明您已收到所有消息。