数据帧:
Day Date GS MS ES
0 Monday 20/02/2017 11 22 33
1 Tuesday 21/02/2017 22 11 33
2 Wednesday 22/02/2017 33 22 11
我想将该DataFrame转换为以下模式
20/02/2017 21/02/2017 22/02/2017
Monday Tuesday Wednesday
0 11 GS MS ES
1 22 ES GS MS
2 33 MS ES GS
所以,我正在尝试根据employee_id转换表格,我尝试了DataFrame.transpose()
,但我无法让它做我需要的。
答案 0 :(得分:1)
df = pd.melt(df, id_vars=['Day', 'Date']).set_index(['Date','Day', 'value']).unstack([0,1])
print (df)
Date 20/02/2017 21/02/2017 22/02/2017
Day Monday Tuesday Wednesday
value
11 GS MS ES
22 MS GS MS
33 ES ES GS
df = df.set_index(['Day','Date'])
.stack()
.reset_index(level=2, name='a')
.set_index('a', append=True)
.unstack([1,0])
print (df)
Date 20/02/2017 21/02/2017 22/02/2017
Day Monday Tuesday Wednesday
a
11 GS MS ES
22 MS GS MS
33 ES ES GS
但如果得到:
ValueError:索引包含重复的条目,无法重塑
print (df)
Day Date GS MS ES
0 Monday 20/02/2017 11 22 33
1 Tuesday 21/02/2017 22 11 11 < 33 changed to 11
2 Wednesday 22/02/2017 33 22 11
melt
和groupby
的解决方案,汇总join
df = pd.melt(df, id_vars=['Day', 'Date'])
.groupby(['Date','Day', 'value'])['variable']
.apply(','.join)
.unstack([0,1])
print (df)
Date 20/02/2017 21/02/2017 22/02/2017
Day Monday Tuesday Wednesday
value
11 GS MS,ES ES
22 MS GS MS
33 ES None GS
melt
和pivot_table
的解决方案:
#aggregate by first value can be dangerous - lost data
df1 = pd.melt(df, id_vars=['Day', 'Date']).pivot_table(index='value',
columns=['Day', 'Date'], values='variable', aggfunc='first')
print (df1)
Day Monday Tuesday Wednesday
Date 20/02/2017 21/02/2017 22/02/2017
value
11 GS MS ES
22 MS GS MS
33 ES None GS
#better aggreagate by sum or join, data are not lost
df1 = pd.melt(df, id_vars=['Day', 'Date']).pivot_table(index='value',
columns=['Day', 'Date'], values='variable', aggfunc=','.join)
print (df1)
Day Monday Tuesday Wednesday
Date 20/02/2017 21/02/2017 22/02/2017
value
11 GS MS,ES ES
22 MS GS MS
33 ES None GS
df1 = pd.melt(df, id_vars=['Day', 'Date']).pivot_table(index='value',
columns=['Day', 'Date'], values='variable', aggfunc='sum')
print (df1)
Day Monday Tuesday Wednesday
Date 20/02/2017 21/02/2017 22/02/2017
value
11 GS MSES ES
22 MS GS MS
33 ES None GS
答案 1 :(得分:1)
如果你融化然后转动你可以得到:
<强>代码:强>
def by_employee(frame):
melted = pd.melt(
frame, value_vars=['GS', 'MS', 'ES'], id_vars=['Day', 'Date'])
pivot = pd.pivot_table(melted, values='variable', index='value',
columns=['Day', 'Date'],
aggfunc=lambda x: x.values[0])
return pivot
测试代码:
data = [x.strip().split() for x in """
Day Date GS MS ES
Monday 20/02/2017 11 22 33
Tuesday 21/02/2017 22 11 33
Wednesday 22/02/2017 33 22 11
""".split('\n')[1:-1]]
df = pd.DataFrame(data[1:], columns=data[0])
print(df)
print(by_employee(df))
<强>结果:强>
Day Date GS MS ES
0 Monday 20/02/2017 11 22 33
1 Tuesday 21/02/2017 22 11 33
2 Wednesday 22/02/2017 33 22 11
Day Monday Tuesday Wednesday
Date 20/02/2017 21/02/2017 22/02/2017
value
11 GS MS ES
22 MS GS MS
33 ES ES GS
答案 2 :(得分:0)
您可以迭代数据框并根据当前的数据创建新的数据框,只有我们将两列连接在一起:
data = []
for row in df.itertuples():
data.append([
# row[2]: Day, row[1]: Date
'{} {}'.format(row[2],row[1]),
*row[3:],
])
new_df = pd.DataFrame(data)