在使用最新的Apache Beam SDK for Python 2.2.0定义的管道上,运行一个读取和写入BigQuery表的简单管道时出现此错误。
由于几行的时间戳与年份< 1900年,读操作失败。如何修补此dataflow_worker包?
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
(4d31192aa4aec063): Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 582, in do_work
work_executor.execute()
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 167, in execute
op.start()
File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
def start(self):
File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
with self.scoped_start_state:
File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
with self.spec.source.reader() as reader:
File "dataflow_worker/native_operations.py", line 48, in dataflow_worker.native_operations.NativeReadOperation.start
for value in reader:
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativefileio.py", line 198, in __iter__
for record in self.read_next_block():
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativeavroio.py", line 95, in read_next_block
yield self.decode_record(record)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativebigqueryavroio.py", line 110, in decode_record
record, self.source.table_schema)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativebigqueryavroio.py", line 104, in _fix_field_values
record[field.name], field)
File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativebigqueryavroio.py", line 83, in _fix_field_value
return dt.strftime('%Y-%m-%d %H:%M:%S.%f UTC')
ValueError: year=200 is before 1900; the datetime strftime() methods require year >= 1900
答案 0 :(得分:0)
不幸的是,您无法修补它以使用时间戳,因为这是Google的Apache Beam运行程序的内部实现:Dataflow。因此,您必须等到Google修复此问题(这应该被识别为错误)。请尽快报告,因为这更多是使用Python版本的限制而不是错误。
问题来自strftime
,您可以在错误中看到。 documentation明确提到它不适用于1900年以前的任何一年。
不过,最后的解决方法是将时间戳转换为字符串(您可以在documentation中指定的BigQuery中执行此操作)。然后在您的Beam管道中,您可以将其重新转换为某个时间戳或任何最适合您的时间段。
您还有一个示例,介绍如何将datetime
对象转换为字符串作为answer中错误的模板。在同一个问题中,还有另一个answer解释了这个错误发生了什么,以及它是如何解决的(在Python中)以及你可以做些什么。不幸的是,解决方案似乎完全避免使用strftime
,而是使用一些替代方案。