ETL从谷歌云存储加载到biquery

时间:2018-05-20 18:41:46

标签: python google-cloud-platform google-bigquery google-cloud-storage dataflow

我想从Google云存储上的数百个CSV文件加载数据,并使用云数据流(最好使用python SDK)将它们每天附加到Bigquery上的单个表中。能告诉我如何实现这一目标吗?

由于

1 个答案:

答案 0 :(得分:0)

我们也可以通过Python实现。 请找到以下代码片段。

def format_output_json(element):
    """
    :param element: is the row data in the csv
    :return: a dictionary with key as column name and value as real data in a row of the csv.

    :row_indices: I have hard-coded here, but can get it at the run time.
    """
    row_indices = ['time_stamp', 'product_name', 'units_sold', 'retail_price']
    row_data = element.split(',')
    dict1 = dict()
    for i in range(len(row_data)):
        dict1[row_indices[i]] = row_data[i]

    return [dict1]