带有Spark数据帧的数组滑动

时间:2018-11-08 13:41:49

标签: apache-spark dataframe sliding

假设数据为T_32_P_1_A_420_H_60_R_0.30841494477846165_S_0 使用scala spark数据框,如何拆分为以下格式

T 32
P 1
A 420
H 60
R 0.30841494477846165
S 0

任何建议都将不胜感激。

预先感谢

1 个答案:

答案 0 :(得分:0)

另一个例子

+-------+-------------+-----------------------------------------------------------------------------+
|Pcode  |Pname        |Pdetails                                                                     |
+-------+-------------+-----------------------------------------------------------------------------+
|Water12|HimalayaWater|Price,1.20;Qty,250ml;Brand,Himalaya;Class,Liquid                             |
|Snack23|Mad Pringles |Price,0.65;Qty,165 g;Brand,MadLtd;Class,Snacks;Batch,12312334;Exp,12/Feb/2012|
+-------+-------------+-----------------------------------------------------------------------------+

我希望将pdeatils分为“类型”和“值”两列,并且预期输出是

+-------+-------------+-----+-----------+
|Pcode  |Pname        |Type |Value      |
+-------+-------------+-----+-----------+
|Water12|HimalayaWater|Price|1.20       |
|Water12|HimalayaWater|Qty  |250ml      |
|Water12|HimalayaWater|Brand|Himalaya   |
|Water12|HimalayaWater|Class|Liquid     |
|Snack23|Mad Pringles |Price|0.65       |
|Snack23|Mad Pringles |Qty  |165 g      |
|Snack23|Mad Pringles |Brand|MadLtd     |
|Snack23|Mad Pringles |Class|Snacks     |
|Snack23|Mad Pringles |Batch|12312334   |
|Snack23|Mad Pringles |Exp  |12/Feb/2012|
+-------+-------------+-----+-----------+