下午好,我正在尝试将列中的文本拆分为特定格式 这是我的下面的桌子
UserId Application
1 Grey Blue::Black Orange;White:Green
2 Yellow Purple::Orange Grey;Blue Pink::Red
我希望阅读以下内容:
UserId Application Role
1 Grey Blue Black Orange
1 White Green
2 Yellow Purple Orange Grey
2 Blue Pink Red
到目前为止,我的代码是
def unnesting(df, explode):
idx=df.index.repeat(df[explode[0]].str.len())
df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
df1.index=idx
return df1.join(df.drop(explode,1),how='left')
df['Application']=df.Roles.str.split(';|::|:').map(lambda x : x[0::2])
unnesting(df.drop('Roles',1),['Application'])
以下输出代码为
UserId Application
1 Grey Blue
1 White
2 Yellow Purple
2 Blue Pink
我不知道如何在::: p之后的第二个拆分中添加第二列(角色)
答案 0 :(得分:1)
给出此数据框:
WITH tree AS
(
SELECT
c1.structureid, c1.parentid, c1.Text,
[level] = 1,
path = CAST(c1.structureid AS VARCHAR(100)),
pathindex = 0, numericalMapping = '0.0'
FROM
[ast].[Structure] c1
WHERE
c1.parentid IS NULL
UNION ALL
SELECT
c2.structureid, c2.parentid, c2.Text,
[level] = tree.[level] + 1,
Path = CAST(tree.path + '/' + RIGHT('000000000' + CAST(c2.structureid AS VARCHAR(10)), 10) AS VARCHAR(100)),
pathindex = 0, numericalMapping = '0.0'
FROM
[ast].[Structure] c2
INNER JOIN
tree ON tree.structureid = c2.parentid
)
SELECT
tree.level, tree.path, tree.parentid,
REPLICATE(' ', tree.level - 1) + tree.Text AS description,
C.* ,
RANK() OVER (PARTITION BY tree.parentId ORDER BY tree.parentId) AS indx
FROM
tree
INNER JOIN
[ast].[Value] AS C ON tree.structureid = C.structureid
ORDER BY
InstanceId, path
您至少可以直接通过
实现最后两列 UserId Application
0 1 Grey Blue::Black Orange;White::Green
1 2 Yellow Purple::Orange Grey;Blue Pink::Red
结果
df.Application.str.split(';', expand=True).stack().str.split('::', expand=True).reset_index().drop(columns=['level_0', 'level_1'])
但是,将 0 1
0 Grey Blue Black Orange
1 White Green
2 Yellow Purple Orange Grey
3 Blue Pink Red
定义为索引之前也会提供正确的UserId
列:
UserId