应用错误收集

我目前正在尝试将两个DataFrame连接在一起，但在其中一个Dataframe中保留相同的顺序。

来自Which operations preserve RDD order?，似乎（纠正我，如果这是不准确的，因为我是Spark的新手）加入不保留顺序，因为行已加入/＆＃34;到达＆＃34;由于数据位于不同的分区，最终数据帧不按指定的顺序排列。

如何在保留一个表的顺序的同时执行两个DataFrame的连接？

，例如，

+------------+---------+ | col2 | col3 | +------------+---------+ | b | x | | a | y | +------------+---------+

加入

+------------+--------------------+ | col1 | col2 | col 3 | +------------+---------+----------+ | 0 | a | y | | 1 | b | x | +------------+---------+----------+

col2 上的

应该给出

coalesce

我听说过有关使用repartition或sock.ReceiveFrom (incoming, ref otherEnd); SendPack message; using(System.IO.MemoryStream ms = new System.IO.MemoryStream(incoming)){ message = ProtoBuf.Serializer.Deserialize<SendPack>(ms); print (message); ms.Flush(); ms.Close(); }的一些事情，但我不确定。任何建议/方法/见解都表示赞赏。

编辑：这类似于MapReduce中有一个reducer吗？如果是这样，那么在Spark中会是什么样子？

Dataframe可以加入Spark保留顺序吗？

1 个答案: