我正在尝试读取数据流中的bigquery数据集。它找不到我指定的bigquery数据集/表。
job_name是preprocess-ga360-190523-130005
我的datalab虚拟机,gcs存储桶和bigquery数据集都位于europe-west2中。
由于某种原因,它正在位置“ US”中搜索数据集
modules versions are apache-beam 2.5.0,google-cloud-dataflow 2.0.0, google-cloud-bigquery 0.25.0
搜索了文档,但无法找到答案。
OUTPUT_DIR = "gs://some-bucket/some-folder/"
#dictionary of pipeline options
options = {
"staging_location": "gs://some-bucket/some-folder/stage/"
"temp_location": "gs://some-bucket/some-folder/tmp/"
"job_name": job_name,
"project": PROJECT,
"runner": "DirectRunner",
"location":'europe-west2',
"region":'europe-west2',
}
#instantiate PipelineOptions object using options dictionary
opts = beam.pipeline.PipelineOptions(flags = [], **options)
#instantantiate Pipeline object using PipelineOptions
with beam.Pipeline(options=opts) as
outfile = "gs://some-bucket/some-folder/train.csv"
(
p | "read_train" >> beam.io.Read(beam.io.BigQuerySource(query =
my_query, use_standard_sql = True))
| "tocsv_train" >> beam.Map(to_csv)
| "write_train" >> beam.io.Write(beam.io.WriteToText(outfile))
)
print("Done")
响应:
HttpError:HttpError访问 https://www.googleapis.com/bigquery/v2/projects/projects/queries/querystring: 回应:<{'status':'404','content-length':'342', 'x-xss-protection':'0','x-content-type-options':'nosniff', 'transfer-encoding':'chunked','vary':'Origin,X-Origin,Referer', '服务器':'ESF','-内容编码':'gzip','缓存控制': 'private','date':'Thu,23 May 2019 13:00:08 GMT','x-frame-options': 'SAMEORIGIN','content-type':'application / json; charset = UTF-8'}>, 内容<{“错误”:{ “代码”:404, “ message”:“未找到:在US位置未找到数据集my_dataset:views”, “错误”:[ { “ message”:“未找到:在US位置未找到数据集my_dataset:views”, “ domain”:“ global”, “ reason”:“ notFound” } ], “ status”:“ NOT_FOUND”}}
答案 0 :(得分:0)
在Apache Beam 2.5.0 Python SDK中,non US query sources weren't yet supported。
似乎在Apache Beam 2.8.0 Python SDK [Release Notes,PR,JIRA]中添加了支持。