从Spark数据框中提取Json数据

时间:2018-10-15 13:29:22

标签: python json dataframe pyspark databricks

+------------------------------------------------------------------+
| message                                                          |
+------------------------------------------------------------------+
|{"name":"east-desktop","viewers":447,"emptyCount":0,"version":0.3}|
|{"name":"west-desktop","viewers":111,"emptyCount":0,"version":0.6}|
|{"name":"west-desktop","viewers":115,"emptyCount":0,"version":0.1}|
+------------------------------------------------------------------+

message:string

我有一个数据框,其中包含一列内的json数据,我想将数据提取到单独的列中或作为json文件。

我正在使用pyspark在Databricks笔记本中工作。

数据框

+---------------------------------------------+
| name        | viewers| emptyCount | version |
+---------------------------------------------+
|east-desktop | 447    | 0          | 0.3     |
|west-desktop | 111    | 0          | 0.6     |
|west-desktop | 115    | 0          | 0.1     |
+---------------------------------------------+

OR Json

{
  "name": "east-desktop",
  "viewers":  447,
  "emptyCount": 0,
  "version": 0.3,
}

1 个答案:

答案 0 :(得分:1)

对,这是几乎相同的问题,但是您可以使用以下示例实现数据帧输出:

df_new = spark.createDataFrame([
(str({"name":"east-desktop","viewers":447,"emptyCount":0,"version":0.3}))
],StringType())

schema = StructType(
    [
        StructField('name', StringType(), True),
        StructField('viewers', IntegerType(), True),
        StructField('emptyCount', IntegerType(), True),
        StructField('version', FloatType(), True)
   ]
)
df_new.withColumn("data", from_json("value",schema)).select("value", col('data.*')).show(truncate=False)

输出:

+-------------------------------------------------------------------------+------------+-------+----------+-------+
|value                                                                    |name        |viewers|emptyCount|version|
+-------------------------------------------------------------------------+------------+-------+----------+-------+
|{'emptyCount': 0, 'version': 0.3, 'name': 'east-desktop', 'viewers': 447}|east-desktop|447    |0         |0.3    |
+-------------------------------------------------------------------------+------------+-------+----------+-------+