将字符串拆分(分解)成多列和多行-Python

时间:2019-07-16 20:02:03

标签: python arrays string dataframe

下午好,我正在尝试将列中的文本拆分为特定格式 这是我的下面的桌子

UserId  Application
1       Grey Blue::Black Orange;White:Green
2       Yellow Purple::Orange Grey;Blue Pink::Red

我希望阅读以下内容:

UserId     Application          Role
    1       Grey Blue           Black Orange
    1       White               Green
    2       Yellow Purple       Orange Grey 
    2       Blue Pink           Red

到目前为止,我的代码是

def unnesting(df, explode):
idx=df.index.repeat(df[explode[0]].str.len())
df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
df1.index=idx
return df1.join(df.drop(explode,1),how='left')

df['Application']=df.Roles.str.split(';|::|:').map(lambda x : x[0::2])

unnesting(df.drop('Roles',1),['Application'])

以下输出代码为

UserId     Application          
        1       Grey Blue           
        1       White               
        2       Yellow Purple        
        2       Blue Pink          

我不知道如何在::: p之后的第二个拆分中添加第二列(角色)

1 个答案:

答案 0 :(得分:1)

给出此数据框:

WITH tree AS
(
    SELECT 
        c1.structureid, c1.parentid, c1.Text, 
        [level] = 1, 
        path = CAST(c1.structureid AS VARCHAR(100)),
        pathindex = 0, numericalMapping = '0.0'
    FROM 
        [ast].[Structure] c1
    WHERE 
        c1.parentid IS NULL 

    UNION ALL

    SELECT 
        c2.structureid, c2.parentid, c2.Text, 
        [level] = tree.[level] + 1, 
        Path = CAST(tree.path + '/' + RIGHT('000000000' + CAST(c2.structureid AS VARCHAR(10)), 10) AS VARCHAR(100)),
        pathindex = 0, numericalMapping =  '0.0'
    FROM 
        [ast].[Structure] c2 
    INNER JOIN 
        tree ON tree.structureid = c2.parentid
)
SELECT 
    tree.level, tree.path, tree.parentid, 
    REPLICATE('  ', tree.level - 1) + tree.Text AS description,
    C.* ,
    RANK() OVER (PARTITION BY tree.parentId ORDER BY tree.parentId) AS indx
FROM 
    tree 
INNER JOIN
    [ast].[Value] AS C ON tree.structureid = C.structureid 
ORDER BY 
    InstanceId, path

您至少可以直接通过

实现最后两列
   UserId                                Application
0       1       Grey Blue::Black Orange;White::Green
1       2  Yellow Purple::Orange Grey;Blue Pink::Red

结果

df.Application.str.split(';', expand=True).stack().str.split('::', expand=True).reset_index().drop(columns=['level_0', 'level_1'])

但是,将 0 1 0 Grey Blue Black Orange 1 White Green 2 Yellow Purple Orange Grey 3 Blue Pink Red 定义为索引之前也会提供正确的UserId列:

UserId