Google DataFlow python管道写入失败

时间:2016-08-07 05:21:09

标签: python-2.7 google-cloud-dataflow

我正在使用Python SDK运行一个简单的DataFlow管道来计算关键字。该作业可以很好地预处理输入数据,但是对于分组/输出步骤失败,出现以下错误。

我猜日志说工作人员在访问临时文件夹时遇到问题,但我们项目中的存储桶存在适当的权限。这可能是一个什么问题?

 "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcsio.py", line
 606, in write raise self.upload_thread.last_error # pylint:
 disable=raising-bad-type HttpError: HttpError accessing
 <https://www.googleapis.com/resumable/upload/storage/v1/b/[PROJECT-NAME-REDACTED]-temp-2016-08-07_04-42-52/o?uploadType=resumable&alt=json&name=0015bf8d-fa87-4c9a-82d6-8ffcd742d770>:
 response: <{'status': '404', 'alternate-protocol': '443:quic',
 'content-length': '165', 'vary': 'Origin, X-Origin', 'server':
 'UploadServer', 'x-guploader-uploadid':
 'AEnB2UoYRPUwhz-OXlJ437k0J8Uxd1lJvTsFbfVJF_YMP2GQEvmdDpo7e-3DVhuqNd9b1A_RFPbfIcK6hCsFcar-hdI94rqJZUvATcDmGRRIvHecAt5CTrg',
 'date': 'Sun, 07 Aug 2016 04:43:23 GMT', 'alt-svc': 'quic=":443";
 ma=2592000; v="36,35,34,33,32,31,30"', 'content-type':
 'application/json; charset=UTF-8'}>, content <{ "error": { "errors": [
 { "domain": "global", "reason": "notFound", "message": "Not Found" }
 ], "code": 404, "message": "Not Found" } } >

1 个答案:

答案 0 :(得分:0)

这是https://issues.apache.org/jira/browse/BEAM-539,它不允许根存储桶作为TextFileSink的输出。要解决此问题,请使用子目录路径(例如gs:// foo / bar)作为输出位置。