如何将Pandas DataFrame转换为群集映射的MultiIndexed形式?

时间:2017-04-09 19:17:22

标签: python pandas seaborn

我有一个dataFrame,它是一个观察列表,按“名称”列分组。我很难把它变成multiIndex格式。

我有类似的东西:

Ionic Framework: 3.0.1
Ionic App Scripts: 1.3.0
Angular Core: 4.0.0
Angular Compiler CLI: 4.0.0
Node: 6.10.1
OS Platform: Windows 10
Navigator Platform: Win32
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36

等等。

我想要的是我可以使用seaborn clustermaps来显示每天(作为一个整体)的'名称'的'比率'与几天内的特定小时之间的相关性。

e.g。我需要类似的东西(不确定是否正确,但这就是我的尝试):

    name | ratio | DayOfWeek | HourOfDay
    foo  | 0.7   | Mon       | 0
    foo  | 0.2   | Mon       | 1
    foo  | 0.11  | Mon       | 2
    foo  | 0.45  | Mon       | 3
..
    foo  | 0.2   | Mon       | 23
    foo  | 0.1   | Tue       | 0
    foo  | 0.6   | Tue       | 1
    foo  | 0.2   | Tue       | 2
..
    foo  | 0.1   | Sun       | 23
    bar  | 0.2   | Mon       | 0
    bar  | 0.11  | Mon       | 1
..

一旦我拥有它,我希望能够将xs()分割成可用于seaborn的heatmap / clustermap的切片。

1 个答案:

答案 0 :(得分:1)

您可以set_index使用unstack

df = df.set_index(['DayOfWeek','HourOfDay','name'])['ratio'].unstack()
print (df)
name                  bar   foo
DayOfWeek HourOfDay            
Mon       0          0.20  0.70
          1          0.11  0.20
          2           NaN  0.11
          3           NaN  0.45
          23          NaN  0.20
Sun       23          NaN  0.10
Tue       0           NaN  0.10
          1           NaN  0.60
          2           NaN  0.20

但是,如果重复项需要pivot_table一些聚合函数,例如meansum ......:

print (df)
   name  ratio DayOfWeek  HourOfDay
0   foo   0.70       Mon          0 <- duplicate for same name, DayOfWeek and HourOfDay - 0.7
1   foo   0.90       Mon          0 <- duplicate for same name, DayOfWeek and HourOfDay - 0.9
2   foo   0.20       Mon          1
3   foo   0.11       Mon          2
4   foo   0.45       Mon          3
5   foo   0.20       Mon         23
6   foo   0.10       Tue          0
7   foo   0.60       Tue          1
8   foo   0.20       Tue          2
9   foo   0.10       Sun         23
10  bar   0.20       Mon          0
11  bar   0.11       Mon          1


df = df.pivot_table(index=['DayOfWeek','HourOfDay'], 
                    columns='name', 
                    values='ratio', 
                    aggfunc='mean')
print (df)

name                  bar   foo
DayOfWeek HourOfDay            
Mon       0          0.20  0.80 < (0.7 + 0.9)/2 = 0.8
          1          0.11  0.20
          2           NaN  0.11
          3           NaN  0.45
          23          NaN  0.20
Sun       23          NaN  0.10
Tue       0           NaN  0.10
          1           NaN  0.60
          2           NaN  0.20

替代groupby

df = df.groupby(['DayOfWeek','HourOfDay','name'])['ratio'].mean().unstack()
print (df)
name                  bar   foo
DayOfWeek HourOfDay            
Mon       0          0.20  0.80 < (0.7 + 0.9)/2 = 0.8
          1          0.11  0.20
          2           NaN  0.11
          3           NaN  0.45
          23          NaN  0.20
Sun       23          NaN  0.10
Tue       0           NaN  0.10
          1           NaN  0.60
          2           NaN  0.20