我正在尝试通过将具有多个值的列分离开来创建新的DataFrame,以便每一行只有一个值。
我尝试了一些groupby操作,但似乎无法将值分开或由用户组织
item title feature
0 1 ToyStory(1995) Adventure|Animation|Children|Comedy|Fantasy
1 2 Jumanji (1995) Adventure|Children|Fantasy
2 3 Grumpier Old Men (1995) Comedy|Romance
3 4 Waiting to Exhale (1995) Comedy|Drama|Romance
4 5 Father of the Bride Part II (1995) Comedy
item feature
0 1 Adventure
1 1 Animation
2 1 Children
3 1 Comedy
4 1 Fantasy
答案 0 :(得分:1)
您需要str.split
,然后是stack
:
r = df.set_index('item').feature.str.split('|', expand=True).stack()
r.index = r.index.get_level_values(0)
r.reset_index(name='feature')
item feature
0 1 Adventure
1 1 Animation
2 1 Children
3 1 Comedy
4 1 Fantasy
5 2 Adventure
6 2 Children
7 2 Fantasy
8 3 Comedy
9 3 Romance
10 4 Comedy
11 4 Drama
12 4 Romance
13 5 Comedy
另一个选择是使用np.repeat
:
u = df.set_index('item').feature.str.split('|')
pd.DataFrame({
'item': np.repeat(u.index, u.str.len()),
'feature': [y for x in u for y in x]
})
item feature
0 1 Adventure
1 1 Animation
2 1 Children
3 1 Comedy
4 1 Fantasy
5 2 Adventure
6 2 Children
7 2 Fantasy
8 3 Comedy
9 3 Romance
10 4 Comedy
11 4 Drama
12 4 Romance
13 5 Comedy