在数据流中指定区域,以读取BigQuery数据集python SDK

时间:2019-05-23 14:31:51

标签: python google-cloud-dataflow apache-beam

我正在尝试读取数据流中的bigquery数据集。它找不到我指定的bigquery数据集/表。

job_name是preprocess-ga360-190523-130005

我的datalab虚拟机,gcs存储桶和bigquery数据集都位于europe-west2中。

由于某种原因,它正在位置“ US”中搜索数据集

modules versions are apache-beam 2.5.0,google-cloud-dataflow 2.0.0, google-cloud-bigquery 0.25.0

搜索了文档,但无法找到答案。

OUTPUT_DIR = "gs://some-bucket/some-folder/"

#dictionary of pipeline options
options = {
    "staging_location": "gs://some-bucket/some-folder/stage/"
    "temp_location": "gs://some-bucket/some-folder/tmp/"
    "job_name": job_name,
    "project": PROJECT,
    "runner": "DirectRunner",
    "location":'europe-west2',
    "region":'europe-west2',
}

#instantiate PipelineOptions object using options dictionary
opts = beam.pipeline.PipelineOptions(flags = [], **options)

#instantantiate Pipeline object using PipelineOptions
with beam.Pipeline(options=opts) as 
    outfile = "gs://some-bucket/some-folder/train.csv"
    (
      p | "read_train" >> beam.io.Read(beam.io.BigQuerySource(query = 
my_query, use_standard_sql = True))
        | "tocsv_train" >> beam.Map(to_csv)
        | "write_train" >> beam.io.Write(beam.io.WriteToText(outfile))
    )
print("Done")

响应:

  

HttpError:HttpError访问   https://www.googleapis.com/bigquery/v2/projects/projects/queries/querystring:   回应:<{'status':'404','content-length':'342',   'x-xss-protection':'0','x-content-type-options':'nosniff',   'transfer-encoding':'chunked','vary':'Origin,X-Origin,Referer',   '服务器':'ESF','-内容编码':'gzip','缓存控制':   'private','date':'Thu,23 May 2019 13:00:08 GMT','x-frame-options':   'SAMEORIGIN','content-type':'application / json; charset = UTF-8'}>,   内容<{“错误”:{       “代码”:404,       “ message”:“未找到:在US位置未找到数据集my_dataset:views”,       “错误”:[         {           “ message”:“未找到:在US位置未找到数据集my_dataset:views”,           “ domain”:“ global”,           “ reason”:“ notFound”         }       ],       “ status”:“ NOT_FOUND”}}

1 个答案:

答案 0 :(得分:0)

在Apache Beam 2.5.0 Python SDK中,non US query sources weren't yet supported

似乎在Apache Beam 2.8.0 Python SDK [Release NotesPRJIRA]中添加了支持。