将Scala集合(takeSample)添加为Dataframe Colum

时间:2018-10-02 02:55:25

标签: scala apache-spark dataframe

我想为数据框ordi中的每一列取一个样本值,并将这些样本值添加为另一个dataFrame(指标)的列 订购商品DataFRame。请让我知道如何实现这个目标。我的输出dataFrame(Metrics)应该像

DATaframe中的NOTNulls列指示字段中没有空值

+------------------------+--------++-----------+
|Fields                  |NOTNulls|SampleValues
+------------------------+--------+-------------+
|order_item_id           |0       |(1234,53665,766,757,... Soon 10 Sample Values)
|order_item_order_id     |0       |(78,794,52,24552,4455..Soon 10 Sample Values)
|order_item_product_id   |0       |(98,52,5151,515266,Soon 10 Sample Values)
|order_item_quantity     |0       |(52.52 , 98.566..Soon 10 Sample Values)
|order_item_subtotal     |0       |(95959.,45151,......Soon 10 Values)
|order_item_product_price|0       |(98,52,5151,515266,Soon 10 Sample Values)
+------------------------+--------+

OrderItems DF

从数据框中的每一列中选择所有非空值

Df with NotNull Columns from the each column of order items dataFrame

将集合转换为SQL行并追加为列的方法

Method to Add as Column

0 个答案:

没有答案