Pandas在数据框中拆分列并获取标题

时间:2017-03-01 04:01:37

标签: python pandas dataframe split

我有一个带有'A'

列的pandas数据框
dfc = pd.DataFrame( {"A": ['AB=0.246154;ABP=39.3908;AC=3', 'AB=0.3;ABP=9.95901;AC=2;AF=0.333333', 'AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86', 'AB=0.461538;ABP=3.51141;AC=2']})

我想在数据框中拆分列'A'并获得新的数据框,如

    A   AB  ABP AC  AF  AN  AO 
0   AB=0.246154;ABP=39.3908;AC=3    0.246154    39.3908 3   None    None    None
1   AB=0.3;ABP=9.95901;AC=2;AF=0.333333 0.3 9.95901 2   0.333333    None    None
2   AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86 0   0   6   1   6   86
3   AB=0.461538;ABP=3.51141;AC=2    0.461538    3.51141 2   None    None    None

我尝试使用

拆分数据框列
dfc.A.str.split(';', expand = True)

但它提供了新的数据框架,

             0            1     2            3     4      5
0  AB=0.246154  ABP=39.3908  AC=3         None  None   None
1       AB=0.3  ABP=9.95901  AC=2  AF=0.333333  None   None
2         AB=0        ABP=0  AC=6         AF=1  AN=6  AO=86
3  AB=0.461538  ABP=3.51141  AC=2         None  None   None

如何在列中使用“=”之前的文本向列添加标题,并将此新数据框添加到原始数据框? 是否有pythonic方式在一行中执行这两个操作?

由于

4 个答案:

答案 0 :(得分:4)

尝试以下操作,在正确拆分字符串后为A列中的每个元素构造一个Series / dictionary,索引/键将成为结果中的标题(使用function MyConstructor () { this.ownProperty = 'value' } MyConstructor.prototype.inheritedProperty = 'value' var createdObject = Object.create(MyConstructor.prototype) console.log(createdObject) console.log('inheritedProperty' in createdObject) //=> true console.log('ownProperty' in createdObject) //=> false var constructedObject = new MyConstructor() console.log(constructedObject) console.log('inheritedProperty' in constructedObject) //=> true console.log('ownProperty' in constructedObject) //=> true将原始列A连接到如果需要,可以使用新的数据框:

pd.concat

答案 1 :(得分:2)

使用extractall

e = dfc.A.str.extractall('([^;]+)=([^;]+)')
pd.Series(e.values[:, 1], [e.index.get_level_values(0), e.values[:, 0]]).unstack()

         AB      ABP AC        AF    AN    AO
0  0.246154  39.3908  3      None  None  None
1       0.3  9.95901  2  0.333333  None  None
2         0        0  6         1     6    86
3  0.461538  3.51141  2      None  None  None

答案 2 :(得分:0)

这应该有效:

d = {"A": ['AB=0.246154;ABP=39.3908;AC=3', 'AB=0.3;ABP=9.95901;AC=2;AF=0.333333', 'AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86', 'AB=0.461538;ABP=3.51141;AC=2']}
rows = [s.split(";") for s in d["A"]]
data = [dict(cell.split('=') for cell in row) for row in rows]

df = pd.DataFrame(data)
print (df)

d = {"A": ['AB=0.246154;ABP=39.3908;AC=3', 'AB=0.3;ABP=9.95901;AC=2;AF=0.333333', 'AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86', 'AB=0.461538;ABP=3.51141;AC=2']}
dfc = pd.DataFrame(d)

f = lambda s : dict(cell.split('=') for cell in s.split(';'))
df = pd.DataFrame(dfc.A.apply(f).tolist())
print (df)

输出:

         AB      ABP AC        AF   AN   AO
0  0.246154  39.3908  3       NaN  NaN  NaN
1       0.3  9.95901  2  0.333333  NaN  NaN
2         0        0  6         1    6   86
3  0.461538  3.51141  2       NaN  NaN  NaN

答案 3 :(得分:0)

def spliter(data):
    pairs = [x.split("=") for x in data.split(";")]
    return pd.Series({key: val for key, val in pairs})


dfc.A.apply(spliter)


         AB      ABP AC        AF   AN   AO
0  0.246154  39.3908  3       NaN  NaN  NaN
1       0.3  9.95901  2  0.333333  NaN  NaN
2         0        0  6         1    6   86
3  0.461538  3.51141  2       NaN  NaN  NaN