熊猫融化成一列

时间:2020-06-30 11:09:49

标签: python pandas numpy pandas-groupby melt

我有一个数据帧,当前数据读取如下:

datafram

df_new = pd.DataFrame({'Week':['nan',14, 14, 14, 14, 14],
                          'Date':['NaT','2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05'],
                          'site 1':['entry',0, 0, 0, 0, 0],
                          'site 1':['exit',0, 0, 0, 0, 0],
                          'site 2':['entry',1, 0,50, 7, 0],
                          'site 2':['exit',10, 0, 7, 19, 0],
                          'site 3':['entry',0, 100, 14, 9, 0],
                          'site 3':['exit',0, 0, 7, 0, 0],
                          'site 4':['entry',0, 0, 0, 0, 0],
                          'site 4':['exit',0, 0, 0, 0, 0],
                          'site 5':['entry',0, 0, 0, 0, 0],
                          'site 5':['exit',15, 0, 25, 0, 80],
                          })

但是我想要的是指示每个站点退出/进入的列(列来自合并的Excel标头)

下面是一个所需的示例(输入时忽略实际值)

df_target = pd.DataFrame({'Week':[14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14],
                          'Date':['2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05','2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05','2020-04-01', '2020-04-02', '2020-04-03', '2020-04-04', '2020-04-05'],
                          'site':['site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 1', 'site 2', 'site 2','site 2','site 2','site 2','site 2'],
                          'entry/exit':['exit','exit', 'exit', 'entry', 'entry', 'entry', 'entry', 'entry', 'entry', 'exit', 'exit', 'exit', 'exit', 'entry', 'entry'],
                          'Value':[12 ,1, 0, 50, 7, 0, 12 ,1, 0, 50, 7, 0, 12 ,1, 0]               
                          })

作为图片: enter image description here

我尝试过

df_target = df_new.melt(id_vars=['Week','Date'], var_name="Site", value_name="Value")

但是您猜我也需要以某种方式将第二行分组还是将其视为第二个标题?

1 个答案:

答案 0 :(得分:2)

首先根据输入MultiIndex创建DataFrame

#if possible
#df = pd.read_csv(file, header=[0,1], index_col=[0,1])

df_new.columns = [df_new.columns, df_new.iloc[0]]
df = df_new.iloc[1:]
print (df.columns)
MultiIndex([(  'Week',  'nan'),
            (  'Date',  'NaT'),
            ('site 1', 'exit'),
            ('site 2', 'exit'),
            ('site 3', 'exit'),
            ('site 4', 'exit'),
            ('site 5', 'exit')],
           )

然后将前2个MultiIndex columns转换为index,因此可以使用DataFrame.unstackSeries.rename_axis融化,然后 Series.reset_index

df = (df.set_index(df.columns[:2].tolist())
        .unstack([0,1])
        .rename_axis(['site','entry/exit','Week','Date'])
        .reset_index(name='Value'))
print (df)
      site entry/exit  Week        Date Value
0   site 1       exit    14  2020-04-01     0
1   site 1       exit    14  2020-04-02     0
2   site 1       exit    14  2020-04-03     0
3   site 1       exit    14  2020-04-04     0
4   site 1       exit    14  2020-04-05     0
5   site 2       exit    14  2020-04-01    10
6   site 2       exit    14  2020-04-02     0
7   site 2       exit    14  2020-04-03     7
8   site 2       exit    14  2020-04-04    19
9   site 2       exit    14  2020-04-05     0
10  site 3       exit    14  2020-04-01     0
11  site 3       exit    14  2020-04-02     0
12  site 3       exit    14  2020-04-03     7
13  site 3       exit    14  2020-04-04     0
14  site 3       exit    14  2020-04-05     0
15  site 4       exit    14  2020-04-01     0
16  site 4       exit    14  2020-04-02     0
17  site 4       exit    14  2020-04-03     0
18  site 4       exit    14  2020-04-04     0
19  site 4       exit    14  2020-04-05     0
20  site 5       exit    14  2020-04-01    15
21  site 5       exit    14  2020-04-02     0
22  site 5       exit    14  2020-04-03    25
23  site 5       exit    14  2020-04-04     0
24  site 5       exit    14  2020-04-05    80