我的谷歌硬盘上传了一些数据文件。 我想将这些文件导入google colab。
REST API方法和PyDrive方法显示如何创建新文件并将其上传到驱动器和colab上。使用它,我无法弄清楚如何在我的python代码中读取驱动器上已存在的数据文件。
我是这个的新手。有人可以帮助我吗?
答案 0 :(得分:12)
(2018年4月15日更新:gspread经常被更新,所以为了确保稳定的工作流程,我指定版本)
对于电子表格文件,基本思路是使用包gspread和pandas来读取Drive中的电子表格,并将它们转换为pandas数据帧格式。
在Colab笔记本中:
#install packages
!pip install gspread==2.1.1
!pip install gspread-dataframe==2.1.0
!pip install pandas==0.22.0
#import packages and authorize connection to Google account:
import pandas as pd
import gspread
from gspread_dataframe import get_as_dataframe, set_with_dataframe
from google.colab import auth
auth.authenticate_user() # verify your account to read files which you have access to. Make sure you have permission to read the file!
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())
然后我知道3种阅读Google电子表格的方法。
按文件名:
spreadsheet = gc.open("goal.csv") # Open file using its name. Use this if the file is already anywhere in your drive
sheet = spreadsheet.get_worksheet(0) # 0 means the first sheet in the file
df2 = pd.DataFrame(sheet.get_all_records())
df2.head()
通过网址:
spreadsheet = gc.open_by_url('https://docs.google.com/spreadsheets/d/1LCCzsUTqBEq5pemRNA9EGy62aaeIgye4XxwReYg1Pe4/edit#gid=509368585') # use this when you have the complete url (the edit#gid means permission)
sheet = spreadsheet.get_worksheet(0) # 0 means the first sheet in the file
df2 = pd.DataFrame(sheet.get_all_records())
df2.head()
按文件密钥/ ID:
spreadsheet = gc.open_by_key('1vpukIbGZfK1IhCLFalBI3JT3aobySanJysv0k5A4oMg') # use this when you have the key (the string in the url following spreadsheet/d/)
sheet = spreadsheet.get_worksheet(0) # 0 means the first sheet in the file
df2 = pd.DataFrame(sheet.get_all_records())
df2.head()
我在Colab笔记本中分享了上面的代码: Setting custom key when pushing new data to firebase database
来源:https://drive.google.com/file/d/1cvur-jpIpoEN3vAO8Fd_yVAT5Qgbr4GV/view?usp=sharing
答案 1 :(得分:1)
!)然后将您的数据设置为公开 对于公共电子表格:
from StringIO import StringIO # got moved to io in python3.
import requests
r = requests.get('https://docs.google.com/spreadsheet/ccc?
key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
data = r.content
In [10]: df = pd.read_csv(StringIO(data), index_col=0,parse_dates=
['Quradate'])
In [11]: df.head()
更多信息:Getting Google Spreadsheet CSV into A Pandas Dataframe
如果私人数据排序相同但你必须做一些auth体操...
答案 2 :(得分:0)
来自 Google Colab 片段
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())
worksheet = gc.open('Your spreadsheet name').sheet1
# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
print(rows)
# Convert to a DataFrame and render.
import pandas as pd
pd.DataFrame.from_records(rows)