在具有多个变量列的数据帧上应用Pandas melt()

时间:2017-10-04 13:59:14

标签: python pandas dataframe

我有一个数据框。行是独特的人,列是采取的各种行动类型。我需要重新构建数据以按行显示各个事件。这是我目前所需的格式,以及我尝试实施的方法。

current = pd.DataFrame({'name': {0: 'ross', 1: 'allen', 2: 'jon'},'action a': {0:'2017-10-04', 1:'2017-10-04', 2:'2017-10-04'},'action b': {0:'2017-10-05', 1:'2017-10-05', 2:'2017-10-05'},'action c': {0:'2017-10-06', 1:'2017-10-06', 2:'2017-10-06'}})


desired = pd.DataFrame({'name':['ross','ross','ross','allen','allen','allen','jon','jon','jon'],'action':['action a','action b','action c','action a','action b','action c','action a','action b','action c'],'date':['2017-10-04','2017-10-05','2017-10-05','2017-10-04','2017-10-05','2017-10-05','2017-10-04','2017-10-05','2017-10-05']})

2 个答案:

答案 0 :(得分:1)

使用df.melt(v0.20 +):

df
     action a    action b    action c   name
0  2017-10-04  2017-10-05  2017-10-06   ross
1  2017-10-04  2017-10-05  2017-10-06  allen
2  2017-10-04  2017-10-05  2017-10-06    jon

df = df.melt('name').sort_values('name')
df.columns = ['name', 'action', 'date']
df
    name    action        date
1  allen  action a  2017-10-04
4  allen  action b  2017-10-05
7  allen  action c  2017-10-06
2    jon  action a  2017-10-04
5    jon  action b  2017-10-05
8    jon  action c  2017-10-06
0   ross  action a  2017-10-04
3   ross  action b  2017-10-05
6   ross  action c  2017-10-06

答案 1 :(得分:1)

r = df.roles
c = df.roles.str.count(',') + 1
i = df.index
df.loc[i.repeat(c)].assign(roles=','.join(r).split(','))

  company  employer_id                roles
0       a            1             engineer
0       a            1       data_scientist
0       a            1            architect
1       b            2             engineer
1       b            2  front_end_developer