如何从存储桶下载csv文件,然后在功能中使用?

时间:2019-05-23 20:42:07

标签: python google-cloud-storage

您好,我有一个csv文件存储在存储桶中,我想在云功能中使用该文件,因此我需要下载 然后在以下过程中使用该文件:

def plot(event, context):
    client = storage.Client()
    df = pd.read_csv('call_conversations.csv', index_col=0)
    objects = df['filepart']
    y_pos = np.arange(len(objects))
    performance = df['confidence']
    plt.bar(y_pos, performance, align='center', alpha=0.99,color='blue')
    plt.xticks(y_pos, objects,rotation=90)
    plt.ylabel('Confianza') 
    plt.title('')
    plt.savefig('cloud.png')
    print('successfull')

我尝试过:

def plot(event, context):
    client = storage.Client()

在这里,我以字符串形式成功获取了csv文件,

    csv = client.bucket(event['bucket']).blob(event['name']).download_as_string()
    df = pd.read_csv(csv, index_col=0)
    objects = df['filepart']
    y_pos = np.arange(len(objects))
    performance = df['confidence']
    plt.bar(y_pos, performance, align='center', alpha=0.99,color='blue')
    plt.xticks(y_pos, objects,rotation=90)
    plt.ylabel('Confianza') 
    plt.title('Nivel de Confianza Transcripciones')
    plt.savefig('cloud.png')
    print('successfull')

但是我得到了

  File "local.py", line 67, in <module>
    trigger()
  File "local.py", line 64, in trigger
    plot(event,None)
  File "local.py", line 49, in plot
    df = pd.read_csv(csv, index_col=0)
  File "/home/adolfo/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/adolfo/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 429, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/adolfo/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 895, in __init__
    self._make_engine(self.engine)
  File "/home/adolfo/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1122, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/adolfo/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1853, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 387, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas/_libs/parsers.pyx", line 725, in pandas._libs.parsers.TextReader._setup_parser_source
OSError: Expected file path name or file-like object, got <class 'bytes'> type

由于我需要将此代码转换为云函数,因此我想找到一种从存储桶下载csv的方法 并保持 它存储在内存中,然后与大熊猫一起使用

我也尝试过:StringIO

def plot(event, context):

    client = storage.Client()
    csv = client.bucket(event['bucket']).blob(event['name']).download_as_string()

    df = pd.read_csv(StringIO(csv), index_col=0)
    objects = df['filepart']
    y_pos = np.arange(len(objects))
    performance = df['confidence']
    plt.bar(y_pos, performance, align='center', alpha=0.99,color='blue')
    plt.xticks(y_pos, objects,rotation=90)
    plt.ylabel('Confianza') 
    plt.title('Nivel de Confianza Transcripciones')
    plt.savefig('cloud.png')
    print('successfull')

但是我得到了

Traceback (most recent call last):
  File "local.py", line 67, in <module>
    trigger()
  File "local.py", line 64, in trigger
    plot(event,None)
  File "local.py", line 49, in plot
    df = pd.read_csv(StringIO(csv), index_col=0)
TypeError: initial_value must be str or None, not bytes

1 个答案:

答案 0 :(得分:2)

问题是熊猫read_csv() API要求读取文件名或类似文件的对象。在通话中,您传入的是从存储桶中找到的对象读取的字符串。这意味着您已经阅读了内容,并且想要将该内容解析为数据框。我进行了搜索以实现该目标,并找到了以下食谱:

Create Pandas DataFrame from a string

使用StringIO似乎是一个很好的解决方案。阅读该链接,希望与您自己的解决方案集成起来很简单。

如果数据为字节,我们可以使用io.BytesIO作为read_csv()的数据源。参见例如:

StringIO replacement that works with bytes instead of strings?