使用google-cloud-bigquery python客户端库的Google Cloud Composer

时间:2018-07-30 23:21:41

标签: google-cloud-python google-cloud-composer

我正在尝试在Google Cloud Composer中运行DAG,其中的第一个组件是使用http GET请求调用API,然后使用python-client库将json插入BigQuery表中。我正在尝试运行此功能:https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.insert_rows_json.html

import requests
import datetime
import ast
import numpy as np
from airflow import models
from airflow.contrib.operators import bigquery_operator
from airflow.operators import python_operator
import google.cloud.bigquery as bigquery

client = bigquery.Client(project = 'is-flagship-data-api-sand')
dataset_id = 'Mobile_Data_Test'
dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table('sample_wed')
table = client.get_table(table_ref)

def get_localytics_data():
    profiles_requests_command = "https://%s:%s@api.localytics.com/v1/exports/profiles/%d/profile"%(api_key,api_secret,28761)
    res_profiles = requests.get(profiles_requests_command)
    if res_profiles.status_code == 200:
        data = res_profiles.content
        data_split = data.split('\n')[:-1]
        data_split_ast = [ast.literal_eval(x) for x in data_split]

        #take out characters from the beginning to have neat columns
        data_split_ast_pretty = [dict(zip(map(lambda x: x[4:], item.keys()), item.values())) for item in data_split_ast]


        #add current date
        current_time = datetime.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S")
        for item in data_split_ast_pretty:
            item['DateCreated'] = current_time


        random_sample = list(np.random.choice(data_split_ast_pretty,5))  
        print random_sample
        client.insert_rows_json(table = table, json_rows = random_sample)
    else:
        pass




run_api = python_operator.PythonOperator(task_id='call_api',
        python_callable=get_localytics_data)

我添加了的PYPI软件包:

请求=== 2.19.1

numpy === 1.12.0

google-cloud-bigquery === 1.4.0

我收到以下错误:损坏的DAG:[/home/airflow/gcs/dags/composer_test_july30_v2.py]“客户端”对象没有属性“ get_table” 在Airflow UI控制台中。

所有显示的代码均在本地运行,但无法在Cloud Composer上运行。

1 个答案:

答案 0 :(得分:0)

听起来好像您有一个过时的google-cloud-bigquery软件包,尽管看起来似乎不应该。

要进行确认,将需要SSH进入Composer环境的Google Kubernetes Engine(GKE)集群,并运行pip freeze | grep bigquery来确定实际安装的版本。

  1. 转到https://console.cloud.google.com/kubernetes/list
  2. 找到相应的GKE群集,然后单击它。
  3. 单击顶部的连接。
  4. 在控制台中,键入kubectl get pods。将会显示一个吊舱列表。
  5. 输入kubectl exec -it <AIRFLOW_WORKER> /bin/bash,其中有一个以airflow-worker- *开头的窗格。
  6. 在窗格中输入pip freeze | grep bigquery,它应该显示模块的版本。