使用Python从Google文档下载电子表格

时间:2010-07-20 06:52:10

标签: python google-docs google-docs-api gdata-python-client

根据密钥和工作表ID(gid),您是否可以制作一个如何下载Google文档电子表格的Python示例?我不能。

我已经搜索了API的第1版,第2版和第3版。我没有运气,我无法弄清楚他们编译的类似ATOM的API,gdata.docs.service.DocsService._DownloadFile私有方法说我是未经授权的,而且我不想编写完整的Google登录身份验证系统我。由于沮丧,我准备将自己捅到脸上。

我有一些电子表格,我想像这样访问它们:

username = 'mygooglelogin@gmail.com'
password = getpass.getpass()

def get_spreadsheet(key, gid=0):
    ... (help!) ...

for row in get_spreadsheet('5a3c7f7dcee4b4f'):
    cell1, cell2, cell3 = row
    ...

请保存我的脸。


更新1:我尝试了以下操作,但Download()Export()的组合似乎无效。 (DocsService here

的文档
import gdata.docs.service
import getpass
import os
import tempfile
import csv

def get_csv(file_path):
  return csv.reader(file(file_path).readlines())

def get_spreadsheet(key, gid=0):
  gd_client = gdata.docs.service.DocsService()
  gd_client.email = 'xxxxxxxxx@gmail.com'
  gd_client.password = getpass.getpass()
  gd_client.ssl = False
  gd_client.source = "My Fancy Spreadsheet Downloader"
  gd_client.ProgrammaticLogin()

  file_path = tempfile.mktemp(suffix='.csv')
  uri = 'http://docs.google.com/feeds/documents/private/full/%s' % key
  try:
    entry = gd_client.GetDocumentListEntry(uri)

    # XXXX - The following dies with RequestError "Unauthorized"
    gd_client.Download(entry, file_path)

    return get_csv(file_path)
  finally:
    try:
      os.remove(file_path)
    except OSError:
      pass

13 个答案:

答案 0 :(得分:30)

https://github.com/burnash/gspread库是一种更新,更简单的与Google Spreadsheets交互的方式,而不是旧的答案,这表明gdata库不仅太低级,而且还是过于复杂。

您还需要创建和下载(采用JSON格式)服务帐户密钥:https://console.developers.google.com/apis/credentials/serviceaccountkey

以下是如何使用它的示例:

import csv
import gspread
from oauth2client.service_account import ServiceAccountCredentials

scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name('credentials.json', scope)

docid = "0zjVQXjJixf-SdGpLKnJtcmQhNjVUTk1hNTRpc0x5b9c"

client = gspread.authorize(credentials)
spreadsheet = client.open_by_key(docid)
for i, worksheet in enumerate(spreadsheet.worksheets()):
    filename = docid + '-worksheet' + str(i) + '.csv'
    with open(filename, 'wb') as f:
        writer = csv.writer(f)
        writer.writerows(worksheet.get_all_values())

答案 1 :(得分:20)

如果有人遇到这个寻找快速修复的问题,这里的another (currently) working solution不依赖于gdata客户端库:

#!/usr/bin/python

import re, urllib, urllib2

class Spreadsheet(object):
    def __init__(self, key):
        super(Spreadsheet, self).__init__()
        self.key = key

class Client(object):
    def __init__(self, email, password):
        super(Client, self).__init__()
        self.email = email
        self.password = password

    def _get_auth_token(self, email, password, source, service):
        url = "https://www.google.com/accounts/ClientLogin"
        params = {
            "Email": email, "Passwd": password,
            "service": service,
            "accountType": "HOSTED_OR_GOOGLE",
            "source": source
        }
        req = urllib2.Request(url, urllib.urlencode(params))
        return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]

    def get_auth_token(self):
        source = type(self).__name__
        return self._get_auth_token(self.email, self.password, source, service="wise")

    def download(self, spreadsheet, gid=0, format="csv"):
        url_format = "https://spreadsheets.google.com/feeds/download/spreadsheets/Export?key=%s&exportFormat=%s&gid=%i"
        headers = {
            "Authorization": "GoogleLogin auth=" + self.get_auth_token(),
            "GData-Version": "3.0"
        }
        req = urllib2.Request(url_format % (spreadsheet.key, format, gid), headers=headers)
        return urllib2.urlopen(req)

if __name__ == "__main__":
    import getpass
    import csv

    email = "" # (your email here)
    password = getpass.getpass()
    spreadsheet_id = "" # (spreadsheet id here)

    # Create client and spreadsheet objects
    gs = Client(email, password)
    ss = Spreadsheet(spreadsheet_id)

    # Request a file-like object containing the spreadsheet's contents
    csv_file = gs.download(ss)

    # Parse as CSV and print the rows
    for row in csv.reader(csv_file):
        print ", ".join(row)

答案 2 :(得分:17)

您可以尝试使用文档Exporting Spreadsheets部分中描述的AuthSub方法。

为电子表格服务获取单独的登录令牌,并替换导出。将此添加到get_spreadsheet代码中对我有用:

import gdata.spreadsheet.service

def get_spreadsheet(key, gid=0):
    # ...
    spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService()
    spreadsheets_client.email = gd_client.email
    spreadsheets_client.password = gd_client.password
    spreadsheets_client.source = "My Fancy Spreadsheet Downloader"
    spreadsheets_client.ProgrammaticLogin()

    # ...
    entry = gd_client.GetDocumentListEntry(uri)
    docs_auth_token = gd_client.GetClientLoginToken()
    gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
    gd_client.Export(entry, file_path)
    gd_client.SetClientLoginToken(docs_auth_token) # reset the DocList auth token

注意我还使用了Export,因为Download似乎只提供PDF文件。

答案 3 :(得分:3)

这不再适用于gdata 2.0.1.4:

gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())

相反,你必须这样做:

gd_client.SetClientLoginToken(gdata.gauth.ClientLoginToken(spreadsheets_client.GetClientLoginToken()))

答案 4 :(得分:3)

(Jul 2016) Rephrasing with current terminology: "How do I download a Google Sheet in CSV format from Google Drive using Python?". (Google Docs now only refers to the cloud-based word processor/text editor which doesn't provide access to Google Sheets spreadsheets.)

First, all other answers are pretty much outdated or will be, either because they use the old GData ("Google Data") Protocol, ClientLogin, or AuthSub, all of which have been deprecated. The same is true for all code or libraries that use the Google Sheets API v3 or older.

Modern Google API access occurs using API keys (public data) or OAuth2 authorization (authorized data), primarily with the Google APIs Client Libraries, including the one for Python. (And no, you don't have to build an entire auth system just to access the APIs... see the blogpost below.)

To perform the task requested in/by the OP, you need authorzed access to the Google Drive API, perhaps to query for specific Sheets to download, and then to perform the actual export(s). Since this is likely a common operation, I wrote a blogpost sharing a code snippet that does this for you. If you wish to pursue this even more, I've got another pair of posts along with a video that outlines how to upload files to and download files from Google Drive.

Note that there is also a newer Google Sheets API v4, but it's primarily for spreadsheet-oriented operations, i.e., inserting data, reading spreadsheet rows, cell formatting, creating charts, adding pivot tables, etc., not file-based request like exporting where the Drive API is the correct one to use.

To see an example of exporting a Google Sheet as CSV from Drive, check out this blog post I wrote; to learn more about using Google Sheets with Python, see this answer I wrote for a similar question.

If you're completely new to Google APIs, then you need to take a further step back and review these videos first:

答案 5 :(得分:2)

以下代码适用于我的情况(Ubuntu 10.4,python 2.6.5 gdata 2.0.14)

import gdata.docs.service
import gdata.spreadsheet.service
gd_client = gdata.docs.service.DocsService()
gd_client.ClientLogin(email,password)
spreadsheets_client = gdata.spreadsheet.service.SpreadsheetsService()
spreadsheets_client.ClientLogin(email,password)
#...
file_path = file_path.strip()+".xls"
docs_token = gd_client.auth_token
gd_client.SetClientLoginToken(spreadsheets_client.GetClientLoginToken())
gd_client.Export(entry, file_path)  
gd_client.auth_token = docs_token

答案 6 :(得分:1)

通过删除不必要的面向对象,我进一步简化了@ Cameron的答案。这使代码更小,更容易理解。我还编辑了网址,这可能会更好。

#!/usr/bin/python
import re, urllib, urllib2

def get_auth_token(email, password):
    url = "https://www.google.com/accounts/ClientLogin"
    params = {
        "Email": email, "Passwd": password,
        "service": 'wise',
        "accountType": "HOSTED_OR_GOOGLE",
        "source": 'Client'
    }
    req = urllib2.Request(url, urllib.urlencode(params))
    return re.findall(r"Auth=(.*)", urllib2.urlopen(req).read())[0]

def download(spreadsheet, worksheet, email, password, format="csv"):
    url_format = 'https://docs.google.com/spreadsheets/d/%s/export?exportFormat=%s#gid=%s'

    headers = {
        "Authorization": "GoogleLogin auth=" + get_auth_token(email, password),
        "GData-Version": "3.0"
    }
    req = urllib2.Request(url_format % (spreadsheet, format, worksheet), headers=headers)
    return urllib2.urlopen(req)


if __name__ == "__main__":
    import getpass
    import csv

    spreadsheet_id = ""             # (spreadsheet id here)
    worksheet_id = ''               # (gid here)
    email = ""                      # (your email here)
    password = getpass.getpass()

    # Request a file-like object containing the spreadsheet's contents
    csv_file = download(spreadsheet_id, worksheet_id, email, password)

    # Parse as CSV and print the rows
    for row in csv.reader(csv_file):
        print ", ".join(row)

答案 7 :(得分:1)

我正在使用这个: 在设置为公开可读的工作表上卷曲“ https://docs.google.com/spreadsheets/d/1-lqLuYJyHAKix-T8NR8wV8ZUUbVOJrZTysccid2-ycs/gviz/tq?tqx=out:csv”。

因此,如果您可以使用公共工作表,则需要使用curl的python版本。

如果您的工作表带有不想显示的选项卡,请创建一个新工作表,然后将要发布的范围导入到该工作表的选项卡中。

答案 8 :(得分:1)

使用表格从Google文档下载电子表格非常简单。

您可以按照以下详细文档

https://pypi.org/project/gsheets/

或执行以下步骤。 我建议您通读文档,以更好地了解其内容。

  1. pip安装gsheets

  2. 使用您要访问其电子表格的Google帐户登录到Google Developers Console。创建(或选择)一个项目并启用Drive API和Sheets API(在Google Apps API下)。

  3. 转到项目的凭据,然后创建“其他”类型的“新凭据”>“ OAuth客户端ID”>。在OAuth 2.0客户端ID的列表中,单击刚刚创建的客户端ID的下载JSON。将文件另存为主目录(用户目录)中的client_secrets.json。

  4. 使用以下代码段。

    from gsheets import Sheets
    sheets = Sheets.from_files('client_secret.json')
    print(sheets) # will ensure authenticate connection
    
    s = sheets.get("{SPREADSHEET_URL}")
    print(s) # will ensure your file is accessible 
    
    s.sheets[1].to_csv('Spam.csv', encoding='utf-8', dialect='excel') # will download the file as csv

答案 9 :(得分:0)

这不是一个完整的答案,但Andreas Kahler使用Google Docs + Google App Engline + Python编写了一个有趣的CMS解决方案。在该领域没有任何经验,我无法确切地看到代码的哪些部分对您有用,但请查看。我知道它与Google Docs帐户接口并播放文件,所以我觉得你会认识到发生了什么。它至少应该指向正确的方向。

Google AppEngine + Google Docs + Some Python = Simple CMS

答案 10 :(得分:0)

Gspread确实比GoogleCL和Gdata有了很大的改进(我已经使用了这两个,并且谢天谢地逐渐取消了Gspread)。我认为这段代码甚至比获得表单内容的早期答案更快:

username = 'sdfsdfsds@gmail.com'
password = 'sdfsdfsadfsdw'
sheetname = "Sheety Sheet"

client = gspread.login(username, password)
spreadsheet = client.open(sheetname)

worksheet = spreadsheet.sheet1
contents = []
for rows in worksheet.get_all_values():
    contents.append(rows)

答案 11 :(得分:0)

(12月16日)尝试我写的另一个图书馆:pygsheets。它类似于gspread,但使用google api v4。它有一个导出电子表格的export方法。

import pygsheets

gc = pygsheets.authorize()

# Open spreadsheet and then workseet
sh = gc.open('my new ssheet')
wks = sh.sheet1

#export as csv
wks.export(pygsheets.ExportType.CSV)

答案 12 :(得分:0)

(2019年3月,Python 3)我的数据通常不敏感,我通常使用类似于CSV的表格格式。

在这种情况下,您可以简单地publish to the web将该工作表用作服务器上的CSV文件。

(一个人使用File-> Publish to the web ...-> Sheet 1-> Comma separated values (.csv)-> Publish发布)。

import csv
import io
import requests

url = "https://docs.google.com/spreadsheets/d/e/<GOOGLE_ID>/pub?gid=0&single=true&output=csv"  # you can get the whole link in the 'Publish to the web' dialog
r = requests.get(url)
r.encoding = 'utf-8'
csvio = io.StringIO(r.text, newline="")
data = []
for row in csv.DictReader(csvio):
    data.append(row)