如何在列中拆分多个值并在熊猫中对byby值进行分组?

时间:2019-04-08 01:55:10

标签: python pandas dataframe transformation

我正在尝试通过将具有多个值的列分离开来创建新的DataFrame,以便每一行只有一个值。

我尝试了一些groupby操作,但似乎无法将值分开或由用户组织

 item    title   feature
0   1   ToyStory(1995) Adventure|Animation|Children|Comedy|Fantasy
1   2   Jumanji (1995)  Adventure|Children|Fantasy
2   3   Grumpier Old Men (1995) Comedy|Romance
3   4   Waiting to Exhale (1995)    Comedy|Drama|Romance
4   5   Father of the Bride Part II (1995)  Comedy
item    feature
0   1   Adventure
1   1   Animation
2   1   Children
3   1   Comedy
4   1   Fantasy

1 个答案:

答案 0 :(得分:1)

您需要str.split,然后是stack

r = df.set_index('item').feature.str.split('|', expand=True).stack()
r.index = r.index.get_level_values(0)

r.reset_index(name='feature')

    item    feature
0      1  Adventure
1      1  Animation
2      1   Children
3      1     Comedy
4      1    Fantasy
5      2  Adventure
6      2   Children
7      2    Fantasy
8      3     Comedy
9      3    Romance
10     4     Comedy
11     4      Drama
12     4    Romance
13     5     Comedy

另一个选择是使用np.repeat

u = df.set_index('item').feature.str.split('|')
pd.DataFrame({
    'item': np.repeat(u.index, u.str.len()), 
    'feature': [y for x in u for y in x]
})

    item    feature
0      1  Adventure
1      1  Animation
2      1   Children
3      1     Comedy
4      1    Fantasy
5      2  Adventure
6      2   Children
7      2    Fantasy
8      3     Comedy
9      3    Romance
10     4     Comedy
11     4      Drama
12     4    Romance
13     5     Comedy