Python - Dataframe - 拆分和堆叠字符串列

时间:2016-04-08 15:20:21

标签: python

我有一个由以下代码生成的数据框:

data={'ID':[1,2,3],'String': ['xKx;yKy;zzz','-','z01;x04']}  
frame=pd.DataFrame(data)

我想将帧数据帧转换为如下所示的数据帧:

data_trans={'ID':[1,1,1,2,3,3],'String': ['xKx','yKy','zzz','-','z01','x04']}
frame_trans=pd.DataFrame(data_trans)

所以,换句话说,我希望将数据中的“String”元素拆分为“;”然后在新的数据帧中堆叠在彼此之下,并相应地复制相关的ID。当然,拆分原则上并不难,但我在堆叠方面遇到了麻烦。

如果您能提供一些关于如何在Python中处理此问题的提示,我将不胜感激。非常感谢!!

1 个答案:

答案 0 :(得分:0)

我不确定这是最好的方法,但这是一种有效的方法:

data={'ID':[1,2,3],'String': ['xKx;yKy;zzz','-','z01;x04']}  
frame=pd.DataFrame(data)
print(frame)

data_trans={'ID':[1,1,1,2,3,3],'String': ['xKx','yKy','zzz','-','z01','x04']}
frame_trans=pd.DataFrame(data_trans)
print(frame_trans)


frame2 = frame.set_index('ID')
# This next line does almost all the work.This can be very memory intensive.  
frame3 = frame2['String'].str.split(';').apply(pd.Series, 1).stack().reset_index()[['ID', 0]]
frame3.columns = ['ID', 'String']
print(frame3)



# Verbose version
# Setting the index makes it easy to have the index column be repeated for each value later
frame2 = frame.set_index('ID')
print("frame2")    
print(frame2)
#Make one column for each of the values in the multi-value columns
frame3a = frame2['String'].str.split(';').apply(pd.Series, 1)
print("frame3a")
print(frame3a)
# Convert from a wide-data format to a long-data format
frame3b = frame3a.stack()
print("frame3b")
print(frame3b)
# Get only the columns we care about
frame3c = frame3b.reset_index()[['ID', 0]]
print("frame3c")
print(frame3c)
# The columns we have have the wrong titles. Let's fix that 
frame3d = frame3c.copy()
frame3d.columns = ['ID', 'String']
print("frame3d")
print(frame3d)

输出:

frame2
             String
    ID             
    1   xKx;yKy;zzz
    2             -
    3       z01;x04
frame3a
          0    1    2
    ID               
    1   xKx  yKy  zzz
    2     -  NaN  NaN
    3   z01  x04  NaN
frame3b
    ID   
    1   0    xKx
        1    yKy
        2    zzz
    2   0      -
    3   0    z01
        1    x04
    dtype: object
frame3c
       ID    0
    0   1  xKx
    1   1  yKy
    2   1  zzz
    3   2    -
    4   3  z01
    5   3  x04
frame3d
       ID String
    0   1    xKx
    1   1    yKy
    2   1    zzz
    3   2      -
    4   3    z01
    5   3    x04