PySparkDataframe列标题concat基于非零值

时间:2016-06-17 19:52:24

标签: dataframe pyspark

+------+-----+------+-------------------+--------------------+----------+
|Manfr|  prodid|region|         absprice33|        absprice27|absprice29|
+-----+-----  +------+-------------------+------------------ +----------+
|  abc|   47   |    US|-0.6015412046017017|1.2074692228904986| 0        |
+------+-----  +------+-------------------+------------------+----------+
|  bcd|   47   |    US|-0.6015412046017017| 0                |1.204986  |
+------+-----  +------+-------------------+------------------+----------+

上面是输入我想要输出如下

Manfr|  prodid|region| new_col
  abc|   47   |    US| absprice33, absprice27
  bcd|   47   |    US| absprice33, absprice29

如果值不等于0,我想联系列名。请注意Manfr的行是唯一的/没有重复的。 prodid | region combination。

您能否帮助使用pyspark数据框

0 个答案:

没有答案