PySpark-使用正则表达式解开数据框

时间:2020-05-29 17:47:17

标签: dataframe pyspark unpivot

我有下面的数据框,经过几次计算后得到的结果。

+------+--------+------------+------------+------------+-----------+-----------------+------------+-----------+-----------------+
| label| machine|value1_count|value2_count|value1_score|value1_band|value1_band_score|value2_score|value2_band|value2_band_score|
+------+--------+------------+------------+------------+-----------+-----------------+------------+-----------+-----------------+
|label1|machine1|           2|           0|       91.67|        low|               70|       100.0|     normal|              100|
|label1|machine2|           1|           1|       95.83|        low|               70|       95.83|        low|               70|
|label2|machine3|           3|           2|        87.5|        low|               70|       91.67|        low|               70|
|label2|machine4|           1|           1|       95.83|        low|               70|       95.83|        low|               70|
+------+--------+------------+------------+------------+-----------+-----------------+------------+-----------+-----------------+

数据帧可以具有value3,value4等的_count,_score,_band,_band_score列。现在,我想取消透视此数据,以便获得以下格式的数据:

+------+--------+----------+-----+-----+------+----------+
| label| machine|value_type|count|score|  band|band_score|
+------+--------+----------+-----+-----+------+----------+
|label1|machine1|    value1|    2|91.67|   low|        70|
|label1|machine1|    value2|    0|  100|normal|       100|
|label1|machine2|    value1|    1|95.83|   low|        70|
|label1|machine2|    value2|    1|95.83|   low|        70|
|label2|machine3|    value1|    3| 87.5|   low|        70|
|label2|machine3|    value2|    2|91.67|   low|        70|
|label2|machine4|    value1|    1|95.83|   low|        70|
|label2|machine4|    value2|    1|95.83|   low|        70|
+------+--------+----------+-----+-----+------+----------+

我认为在这里不使用正则表达式会很有用,但我无法实现。有人可以帮我吗?

预先感谢

0 个答案:

没有答案