我有一个由以下代码生成的数据框:
data={'ID':[1,2,3],'String': ['xKx;yKy;zzz','-','z01;x04']}
frame=pd.DataFrame(data)
我想将帧数据帧转换为如下所示的数据帧:
data_trans={'ID':[1,1,1,2,3,3],'String': ['xKx','yKy','zzz','-','z01','x04']}
frame_trans=pd.DataFrame(data_trans)
所以,换句话说,我希望将数据中的“String”元素拆分为“;”然后在新的数据帧中堆叠在彼此之下,并相应地复制相关的ID。当然,拆分原则上并不难,但我在堆叠方面遇到了麻烦。
如果您能提供一些关于如何在Python中处理此问题的提示,我将不胜感激。非常感谢!!
答案 0 :(得分:0)
我不确定这是最好的方法,但这是一种有效的方法:
data={'ID':[1,2,3],'String': ['xKx;yKy;zzz','-','z01;x04']}
frame=pd.DataFrame(data)
print(frame)
data_trans={'ID':[1,1,1,2,3,3],'String': ['xKx','yKy','zzz','-','z01','x04']}
frame_trans=pd.DataFrame(data_trans)
print(frame_trans)
frame2 = frame.set_index('ID')
# This next line does almost all the work.This can be very memory intensive.
frame3 = frame2['String'].str.split(';').apply(pd.Series, 1).stack().reset_index()[['ID', 0]]
frame3.columns = ['ID', 'String']
print(frame3)
# Verbose version
# Setting the index makes it easy to have the index column be repeated for each value later
frame2 = frame.set_index('ID')
print("frame2")
print(frame2)
#Make one column for each of the values in the multi-value columns
frame3a = frame2['String'].str.split(';').apply(pd.Series, 1)
print("frame3a")
print(frame3a)
# Convert from a wide-data format to a long-data format
frame3b = frame3a.stack()
print("frame3b")
print(frame3b)
# Get only the columns we care about
frame3c = frame3b.reset_index()[['ID', 0]]
print("frame3c")
print(frame3c)
# The columns we have have the wrong titles. Let's fix that
frame3d = frame3c.copy()
frame3d.columns = ['ID', 'String']
print("frame3d")
print(frame3d)
输出:
frame2
String
ID
1 xKx;yKy;zzz
2 -
3 z01;x04
frame3a
0 1 2
ID
1 xKx yKy zzz
2 - NaN NaN
3 z01 x04 NaN
frame3b
ID
1 0 xKx
1 yKy
2 zzz
2 0 -
3 0 z01
1 x04
dtype: object
frame3c
ID 0
0 1 xKx
1 1 yKy
2 1 zzz
3 2 -
4 3 z01
5 3 x04
frame3d
ID String
0 1 xKx
1 1 yKy
2 1 zzz
3 2 -
4 3 z01
5 3 x04