我有一个带有'A'
列的pandas数据框dfc = pd.DataFrame( {"A": ['AB=0.246154;ABP=39.3908;AC=3', 'AB=0.3;ABP=9.95901;AC=2;AF=0.333333', 'AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86', 'AB=0.461538;ABP=3.51141;AC=2']})
我想在数据框中拆分列'A'并获得新的数据框,如
A AB ABP AC AF AN AO
0 AB=0.246154;ABP=39.3908;AC=3 0.246154 39.3908 3 None None None
1 AB=0.3;ABP=9.95901;AC=2;AF=0.333333 0.3 9.95901 2 0.333333 None None
2 AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86 0 0 6 1 6 86
3 AB=0.461538;ABP=3.51141;AC=2 0.461538 3.51141 2 None None None
我尝试使用
拆分数据框列dfc.A.str.split(';', expand = True)
但它提供了新的数据框架,
0 1 2 3 4 5
0 AB=0.246154 ABP=39.3908 AC=3 None None None
1 AB=0.3 ABP=9.95901 AC=2 AF=0.333333 None None
2 AB=0 ABP=0 AC=6 AF=1 AN=6 AO=86
3 AB=0.461538 ABP=3.51141 AC=2 None None None
如何在列中使用“=”之前的文本向列添加标题,并将此新数据框添加到原始数据框? 是否有pythonic方式在一行中执行这两个操作?
由于
答案 0 :(得分:4)
尝试以下操作,在正确拆分字符串后为A列中的每个元素构造一个Series / dictionary,索引/键将成为结果中的标题(使用function MyConstructor () {
this.ownProperty = 'value'
}
MyConstructor.prototype.inheritedProperty = 'value'
var createdObject = Object.create(MyConstructor.prototype)
console.log(createdObject)
console.log('inheritedProperty' in createdObject) //=> true
console.log('ownProperty' in createdObject) //=> false
var constructedObject = new MyConstructor()
console.log(constructedObject)
console.log('inheritedProperty' in constructedObject) //=> true
console.log('ownProperty' in constructedObject) //=> true
将原始列A连接到如果需要,可以使用新的数据框:
pd.concat
答案 1 :(得分:2)
使用extractall
e = dfc.A.str.extractall('([^;]+)=([^;]+)')
pd.Series(e.values[:, 1], [e.index.get_level_values(0), e.values[:, 0]]).unstack()
AB ABP AC AF AN AO
0 0.246154 39.3908 3 None None None
1 0.3 9.95901 2 0.333333 None None
2 0 0 6 1 6 86
3 0.461538 3.51141 2 None None None
答案 2 :(得分:0)
这应该有效:
d = {"A": ['AB=0.246154;ABP=39.3908;AC=3', 'AB=0.3;ABP=9.95901;AC=2;AF=0.333333', 'AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86', 'AB=0.461538;ABP=3.51141;AC=2']}
rows = [s.split(";") for s in d["A"]]
data = [dict(cell.split('=') for cell in row) for row in rows]
df = pd.DataFrame(data)
print (df)
或
d = {"A": ['AB=0.246154;ABP=39.3908;AC=3', 'AB=0.3;ABP=9.95901;AC=2;AF=0.333333', 'AB=0;ABP=0;AC=6;AF=1;AN=6;AO=86', 'AB=0.461538;ABP=3.51141;AC=2']}
dfc = pd.DataFrame(d)
f = lambda s : dict(cell.split('=') for cell in s.split(';'))
df = pd.DataFrame(dfc.A.apply(f).tolist())
print (df)
输出:
AB ABP AC AF AN AO
0 0.246154 39.3908 3 NaN NaN NaN
1 0.3 9.95901 2 0.333333 NaN NaN
2 0 0 6 1 6 86
3 0.461538 3.51141 2 NaN NaN NaN
答案 3 :(得分:0)
def spliter(data):
pairs = [x.split("=") for x in data.split(";")]
return pd.Series({key: val for key, val in pairs})
dfc.A.apply(spliter)
AB ABP AC AF AN AO
0 0.246154 39.3908 3 NaN NaN NaN
1 0.3 9.95901 2 0.333333 NaN NaN
2 0 0 6 1 6 86
3 0.461538 3.51141 2 NaN NaN NaN