我正在使用Python SDK运行一个简单的DataFlow管道来计算关键字。该作业可以很好地预处理输入数据,但是对于分组/输出步骤失败,出现以下错误。
我猜日志说工作人员在访问临时文件夹时遇到问题,但我们项目中的存储桶存在适当的权限。这可能是一个什么问题?
"/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcsio.py", line
606, in write raise self.upload_thread.last_error # pylint:
disable=raising-bad-type HttpError: HttpError accessing
<https://www.googleapis.com/resumable/upload/storage/v1/b/[PROJECT-NAME-REDACTED]-temp-2016-08-07_04-42-52/o?uploadType=resumable&alt=json&name=0015bf8d-fa87-4c9a-82d6-8ffcd742d770>:
response: <{'status': '404', 'alternate-protocol': '443:quic',
'content-length': '165', 'vary': 'Origin, X-Origin', 'server':
'UploadServer', 'x-guploader-uploadid':
'AEnB2UoYRPUwhz-OXlJ437k0J8Uxd1lJvTsFbfVJF_YMP2GQEvmdDpo7e-3DVhuqNd9b1A_RFPbfIcK6hCsFcar-hdI94rqJZUvATcDmGRRIvHecAt5CTrg',
'date': 'Sun, 07 Aug 2016 04:43:23 GMT', 'alt-svc': 'quic=":443";
ma=2592000; v="36,35,34,33,32,31,30"', 'content-type':
'application/json; charset=UTF-8'}>, content <{ "error": { "errors": [
{ "domain": "global", "reason": "notFound", "message": "Not Found" }
], "code": 404, "message": "Not Found" } } >
答案 0 :(得分:0)
这是https://issues.apache.org/jira/browse/BEAM-539,它不允许根存储桶作为TextFileSink的输出。要解决此问题,请使用子目录路径(例如gs:// foo / bar)作为输出位置。