应用错误收集

我正在使用2-3转换通过monotonically_increasing_id在数据框中创建一列，因为很少有记录ID被更改。例如

optgroup

在对newDf进行了更多转换之后

val newDf = df.withColumn("rowId", monotonically_increasing_id()) 
newDf.show()

+---------+--------------------+-------+
| userId  | area    |  flag |rowId|
+---------+--------------------+-------+
|123      |[Blah1...|   true|    0|
|234      |[Blah2...|   true|    1|
|216      |[Blah3...|   true|    2|
|123      |[blah4...|  false|    3|
|345      |[Blah5...|   true|    4|
|677      |[Blah6...|  false|    5|

某些行已更改。

假定在生成具有monotonically_increasing_id的数据帧后，将其缓存到内存中。如果将其逐出内存怎么办。将来的操作将尝试再次重新生成数据帧/（部分分区）

有人可以帮助我吗？

monotonically_increasing_id是否正在为spark 2.3.1中的同一记录生成2个不同的唯一ID？

0 个答案: