应用错误收集

我在代码中广泛使用了Pyspark窗口功能。但这似乎无法正常工作。

但是i'm getting the correct results only for the last record by order by column用于分区。

文档说，这是实验性的，我们可以在生产系统中使用它吗？ http://spark.apache.org/docs/2.3.0/api/python/pyspark.sql.html#pyspark.sql.Window

示例代码：

invWindow = Window.partitionBy(masterDrDF["ResId"], masterDrDF["vrsn_strt_dts"]).orderBy(masterDrDF["vrsn_strt_dts"]).rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)

max(when(invDetDF["InvoiceItemType"].like('ABD%'), 1).otherwise(0)).over(invWindow).alias("ABD_PKG_IN")

pyspark中的窗口功能-奇怪的行为

0 个答案: