PySpark,从数据框创建折线图,而数据砖上没有“类别”

时间:2019-03-05 07:45:54

标签: pyspark databricks

我正在databricks上运行以下代码:

dataToShow = jDataJoined.\
withColumn('id', monotonically_increasing_id()).\
filter( 
  (jDataJoined.containerNumber == 'SUDU8108536')).\
select(col('id'), col('returnTemperature'), col('supplyTemperature'))

这将给我类似表格数据

tabular data

现在,我想显示一个以returnTemperature和supplyTemperature作为类别的折线图。

据我了解,databricks中的方法display希望将类别作为第二个参数,因此基本上我应该拥有类似的东西

id - temperatureCategory - value
1 - returnTemperature - 25.0
1 - supplyTemperature - 27.0
2 - returnTemperature - 24.0
2 - supplyTemperature - 28.0

如何以这种方式转换数据框?

1 个答案:

答案 0 :(得分:1)

我不知道您的格式是否是显示方法所期望的格式,但是您可以使用sql函数create_mapexplode进行此转换:

#creates a example df
from pyspark.sql import functions as F
l1 = [(1,25.0,27.0),(2,24.0,28.0)]
df = spark.createDataFrame(l1,['id','returnTemperature','supplyTemperature'])

#creates a map column which contains the values of the returnTemperature and supplyTemperature
df = df.withColumn('mapCol', F.create_map(
                                    F.lit('returnTemperature'),df.returnTemperature
                                    ,F.lit('supplyTemperature'),df.supplyTemperature
                                   ) 
                  )
#The explode function creates a new row for each element of the map
df = df.select('id',F.explode(df.mapCol).alias('temperatureCategory','value'))
df.show()

输出:

+---+-------------------+-----+ 
| id|temperatureCategory|value| 
+---+-------------------+-----+ 
| 1 |  returnTemperature| 25.0| 
| 1 |  supplyTemperature| 27.0| 
| 2 |  returnTemperature| 24.0| 
| 2 |  supplyTemperature| 28.0| 
+---+-------------------+-----+