我有一个数据帧,当前数据读取如下:
df_new = pd.DataFrame({'Week':['nan',14, 14, 14, 14, 14],
'Date':['NaT','2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05'],
'site 1':['entry',0, 0, 0, 0, 0],
'site 1':['exit',0, 0, 0, 0, 0],
'site 2':['entry',1, 0,50, 7, 0],
'site 2':['exit',10, 0, 7, 19, 0],
'site 3':['entry',0, 100, 14, 9, 0],
'site 3':['exit',0, 0, 7, 0, 0],
'site 4':['entry',0, 0, 0, 0, 0],
'site 4':['exit',0, 0, 0, 0, 0],
'site 5':['entry',0, 0, 0, 0, 0],
'site 5':['exit',15, 0, 25, 0, 80],
})
但是我想要的是指示每个站点退出/进入的列(列来自合并的Excel标头)
下面是一个所需的示例(输入时忽略实际值)
df_target = pd.DataFrame({'Week':[14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14],
'Date':['2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05','2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05','2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05'],
'site':['site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 2', 'site 2','site 2','site 2','site 2','site 2'],
'entry/exit':['exit','exit', 'exit', 'entry', 'entry', 'entry', 'entry', 'entry', 'entry', 'exit', 'exit', 'exit', 'exit', 'entry', 'entry'],
'Value':[12 ,1, 0, 50, 7, 0, 12 ,1, 0, 50, 7, 0, 12 ,1, 0]
})
我尝试过
df_target = df_new.melt(id_vars=['Week','Date'], var_name="Site", value_name="Value")
但是您猜我也需要以某种方式将第二行分组还是将其视为第二个标题?
答案 0 :(得分:2)
首先根据输入MultiIndex
创建DataFrame
:
#if possible
#df = pd.read_csv(file, header=[0,1], index_col=[0,1])
df_new.columns = [df_new.columns, df_new.iloc[0]]
df = df_new.iloc[1:]
print (df.columns)
MultiIndex([( 'Week', 'nan'),
( 'Date', 'NaT'),
('site 1', 'exit'),
('site 2', 'exit'),
('site 3', 'exit'),
('site 4', 'exit'),
('site 5', 'exit')],
)
然后将前2个MultiIndex columns
转换为index
,因此可以使用DataFrame.unstack
与Series.rename_axis
融化,然后
Series.reset_index
:
df = (df.set_index(df.columns[:2].tolist())
.unstack([0,1])
.rename_axis(['site','entry/exit','Week','Date'])
.reset_index(name='Value'))
print (df)
site entry/exit Week Date Value
0 site 1 exit 14 2020-04-01 0
1 site 1 exit 14 2020-04-02 0
2 site 1 exit 14 2020-04-03 0
3 site 1 exit 14 2020-04-04 0
4 site 1 exit 14 2020-04-05 0
5 site 2 exit 14 2020-04-01 10
6 site 2 exit 14 2020-04-02 0
7 site 2 exit 14 2020-04-03 7
8 site 2 exit 14 2020-04-04 19
9 site 2 exit 14 2020-04-05 0
10 site 3 exit 14 2020-04-01 0
11 site 3 exit 14 2020-04-02 0
12 site 3 exit 14 2020-04-03 7
13 site 3 exit 14 2020-04-04 0
14 site 3 exit 14 2020-04-05 0
15 site 4 exit 14 2020-04-01 0
16 site 4 exit 14 2020-04-02 0
17 site 4 exit 14 2020-04-03 0
18 site 4 exit 14 2020-04-04 0
19 site 4 exit 14 2020-04-05 0
20 site 5 exit 14 2020-04-01 15
21 site 5 exit 14 2020-04-02 0
22 site 5 exit 14 2020-04-03 25
23 site 5 exit 14 2020-04-04 0
24 site 5 exit 14 2020-04-05 80