Python Pandas拆分大表列

时间:2017-06-21 14:06:03

标签: python pandas

我有一张大桌子(4M线和20列)。在一个特定的列中,我有一个列表:

                                        8 
0       [key1=it, key3=domain, key6=0001]                                                                                              
1                             [key2=home]
2                [key4=pippo, key5=pluto]

给出一个列表键= []我想用一个有效的方式替换列'8'和其他列,如下所示:

       key1  key2    key3   key4  key5  key6
0        it  None  domain   None  None  0001
1      None  home    None   None  None  None
2      None  None    None  pippo pluto  None

谢谢! 我

2 个答案:

答案 0 :(得分:2)

s = lambda x: x.split('=')
rows = df.loc[:, 8].values.tolist()
pd.DataFrame([dict(map(s, r)) for r in rows])

  key1  key2    key3   key4   key5  key6
0   it   NaN  domain    NaN    NaN  0001
1  NaN  home     NaN    NaN    NaN   NaN
2  NaN   NaN     NaN  pippo  pluto   NaN

设置

df = pd.Series([
        ['key1=it', 'key3=domain', 'key6=0001'],
        ['key2=home'],
        ['key4=pippo', 'key5=pluto']
    ]).to_frame(8)

答案 1 :(得分:-1)

我已经用这种方式解决了坏行的问题,但它是for循环:

        self.s = lambda x: x.split('=')

        self.rows = self.df.loc[:, 8].values.tolist()
        dictList8 = []
        for idx, self.r in enumerate(self.rows): 
            try:
                dictList8.append(dict(map(self.s, self.r)))
            except:
                dictList8.append({'skipped': 'True'})
                continue
        self.dfMod8 = pd.DataFrame(dictList8)
        del self.df[8]

任何想法如何让它更快?