使用数据框在pyspark中爆炸具有嵌套列表的列

时间:2018-11-13 12:26:32

标签: python-2.7 pyspark apache-spark-sql

我有一个datafram,下面有列:

[Row(
col_1=True, 
col_2=[Row(val1=70, val2=None, val3=u'35f81fd0')], 
col_3=[
    Row(scr=100, id=u'ae288', 
    i_rs=[
            Row(a=
                [Row(id=42, value=u'10'), Row(id=49, value=u'100')], 
                c_d=None, t_i=None, name=u'', pd=413, st=0, stamp=None, t_s=50
                ), 

            Row(a=
                [Row(id=42, value=u'100'), Row(id=49, value=u'10')], 
                c_d=None, t_i=None, name=u'Jfe', pd=411, st=0, stamp=None, t_s=20
                ), 

            Row(a=
                [
                     Row(id=453, value=u'1523430000'), 
                     Row(id=709, value=u'1523516400'), 
                     Row(id=964, value=u'45'), 
                     Row(id=220, value=u'45'), 
                     Row(id=476, value=u'0'), 
                     Row(id=736, value=u'm_U:2;m_O:33;'), 
                     Row(id=340, value=u'0')
                 ], 
                c_d=None, t_i=None, name=u'', pd=1, st=0, stamp=None, t_s=10)
            ], 
    name=None, re=1, t_s=70, te=1)]
)]

我想将上面的列转换为多列,如下所示:

enter image description here

如何在pyspark中实现以上输出。

0 个答案:

没有答案