我正在尝试获取以下代码,以将多个csv文件(ALLOWANCE1.csv和ALLOWANCE2.csv)从Google Cloud Bucket导入到python 2.x中的Datalab:
import numpy as np
import pandas as pd
from google.datalab import Context
import google.datalab.bigquery as bq
import google.datalab.storage as storage
from io import BytesIO
myBucket = storage.Bucket('Bucket Name')
object_list = myBucket.objects(prefix='ALLOWANCE')
df_list = []
for obj in object_list:
%gcs read --object $obj.uri --variable data
df_list.append(pd.read_csv(BytesIO(data)))
concatenated_df = pd.concat(df_list, ignore_index=True)
concatenated_df.head()
我在for循环的开头就遇到以下错误:
RequestExceptionTraceback (most recent call last)
<ipython-input-5-3188aab389b8> in <module>()
----> 1 for obj in object_list:
2 get_ipython().magic(u'gcs read --object $obj.uri --variable
data')
3 df_list.append(pd.read_csv(BytesIO(data)))
/usr/local/envs/py2env/lib/python2.7/site-
packages/google/datalab/utils/_iterator.pyc in __iter__(self)
34 """Provides iterator functionality."""
35 while self._first_page or (self._page_token is not None):
---> 36 items, next_page_token = self._retriever(self._page_token, self._count)
37
38 self._page_token = next_page_token
/usr/local/envs/py2env/lib/python2.7/site-packages/google/datalab/storage/_object.pyc in _retrieve_objects(self, page_token, _)
319 page_token=page_token)
320 except Exception as e:
--> 321 raise e
322
323 objects = list_info.get('items', [])
RequestException: HTTP request failed: Not Found
我花了一些时间解决此问题,但没有运气!任何帮助将不胜感激!
答案 0 :(得分:0)
我认为您不能将笔记本外壳命令与python变量混合使用。也许尝试使用子进程python lib并使用python调用命令行命令。
import numpy as np
import pandas as pd
from google.datalab import Context
import google.datalab.bigquery as bq
import google.datalab.storage as storage
from io import BytesIO
#new line
from subprocess import call
from google.colab import auth #new lines
auth.authenticate_user()
myBucket = storage.Bucket('Bucket Name')
object_list = myBucket.objects(prefix='ALLOWANCE')
df_list = []
for obj in object_list:
call(['gsutil', 'cp', obj.uri, '/tmp/']) #first copy file
filename = obj.uri.split('/')[-1] #get file name
df_list.append(pd.read_csv('/tmp/' + filename))
concatenated_df = pd.concat(df_list, ignore_index=True)
concatenated_df.head()
请注意,我没有运行此代码,但已成功对自己的文件运行了“调用”。另一个建议是在读取文件副本调用之前先在一个循环中运行它们。这样,如果您对数据进行大量迭代,就不必每次都重新下载它们。