在python apache beam中,是否可以按特定顺序编写元素?

时间:2016-08-30 19:05:57

标签: python google-cloud-dataflow apache-beam

我正在使用beam来处理重叠窗口上的时间序列数据。在我的管道的末尾,我将每个元素写入文件。每个元素表示一个csv行,其中一个字段是关联窗口的时间戳。我想按照时间戳的顺序编写元素。有没有办法使用python梁库?

1 个答案:

答案 0 :(得分:1)

While this isn't part of the base distribution, this is something you could implement by processing these elements and sorting them as part of a global window before writing out to a file, with the following caveats:

  • The entire contents of the window would need to fit in memory, or you would need to chunk up the file into smaller global windows.
  • If you are doing the second option, you'd need to have a strategy for writing the smaller windows in order to the file.