如何将数据框列解析为列

时间:2019-06-17 10:25:36

标签: python json pyspark

我有一个数据框,其中有两列包含json数据,我想将该json数据解析为我的数据框所在的列

+------------+---------+--------------------+--------------------+
|   firstname| lastname|    travellerdetails|            bookjson|
+------------+---------+--------------------+--------------------+
|           K|    Gupta|[{FlierNumber:","...|[{origin:DEL","Et...|
|           K|    Gupta|[{FlierNumber:","...|[{origin:DEL","Et...|
|Jana Ranjani|Raghu Raj|[{BaggageTypeRetu...|[{origin:AMD","De...|
+------------+---------+--------------------+--------------------+

有两列包含json数据,我想解析该列

The first row of travellerdetails is

""[{""""FlierNumber"""":""""""""","BaggageTypeReturn"""":""""""""","FirstName"""":""""K""""","Title"""":""""1""""","MiddleName"""":""""D""""","LastName"""":""""Gupta""""","MealTypeOnward"""":""""""""","DateOfBirth"""":""""""""","BaggageTypeOnward"""":""""""""","SeatTypeOnward"""":""""""""","MealTypeReturn"""":""""""""","FrequentAirline"""":null","Type"""":""""A""""","SeatTypeReturn"""":""""""""}","{""""FlierNumber"""":""""""""","BaggageTypeReturn"""":""""""""","FirstName"""":""""Sweety""""","Title"""":""""2""""","MiddleName"""":""""""""","LastName"""":""""Gupta""""","MealTypeOnward"""":""""""""","DateOfBirth"""":""""""""","BaggageTypeOnward"""":""""""""","SeatTypeOnward"""":""""""""","MealTypeReturn"""":""""""""","FrequentAirline"""":null","Type"""":""""A""""","SeatTypeReturn"""":""""""""}]""

the first row of bookjson is

""[{""""origin"""":""""DEL""""","EticketFlag"""":""""false""""","flightcode"""":""""251""""","farebasis"""":""""L0IP""""","spicestatus"""":""""Canceled""""","deptime"""":""""07:20""""","codeshare"""":""""""""","ibibopartner"""":""""indigonew""""","productclass"""":""""R""""","duration"""":""""2h 5m""""","ruleno"""":""""4910""""","qtype"""":""""fbs""""","tickettype"""":""""e""""","flightno"""":""""251""""","servicetype"""":""""""""","fareclass"""":""""L""""","faresequence"""":""""1""""","destination"""":""""GAU""""","carrierid"""":""""6E""""","stops"""":""""0""""","state"""":""""New""""","fare"""":{""""adultphf"""":50","adultttf"""":75","adultdf"""":115","totalsurcharge"""":0","indigonewgrossamount"""":10202","adulttotalfare"""":5101","totalcommission"""":0","adultbasefare"""":4150","totalpassengerhandlingfee"""":0","adultudf"""":562","adultpassengerservicefee"""":149","totalpassengerservicefee"""":0","totalothers"""":0","childtotalfare"""":0","totalbasefare"""":8300","totalfare"""":101...

请帮我解析该列.. ??

1 个答案:

答案 0 :(得分:0)

您要寻找的是F.from_json()

您将像这样使用它:

from pyspark.sql import functions as F

df = df.withColumn("travellerdetails", F.from_json(F.col("travellerdetails")))
df = df.withColumn("bookjson", F.from_json(F.col("bookjson")))

但是,请注意,您在问题中提供的JSON无效,因此将产生null。 另外请注意,您可以将架构作为第二个参数传递给from_json-这样可以加快解析速度,并允许您为每个字段指定所需的数据类型。