我有一个dataFrame,它是一个观察列表,按“名称”列分组。我很难把它变成multiIndex格式。
我有类似的东西:
Ionic Framework: 3.0.1
Ionic App Scripts: 1.3.0
Angular Core: 4.0.0
Angular Compiler CLI: 4.0.0
Node: 6.10.1
OS Platform: Windows 10
Navigator Platform: Win32
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36
等等。
我想要的是我可以使用seaborn clustermaps来显示每天(作为一个整体)的'名称'的'比率'与几天内的特定小时之间的相关性。
e.g。我需要类似的东西(不确定是否正确,但这就是我的尝试):
name | ratio | DayOfWeek | HourOfDay
foo | 0.7 | Mon | 0
foo | 0.2 | Mon | 1
foo | 0.11 | Mon | 2
foo | 0.45 | Mon | 3
..
foo | 0.2 | Mon | 23
foo | 0.1 | Tue | 0
foo | 0.6 | Tue | 1
foo | 0.2 | Tue | 2
..
foo | 0.1 | Sun | 23
bar | 0.2 | Mon | 0
bar | 0.11 | Mon | 1
..
一旦我拥有它,我希望能够将xs()分割成可用于seaborn的heatmap / clustermap的切片。
答案 0 :(得分:1)
df = df.set_index(['DayOfWeek','HourOfDay','name'])['ratio'].unstack()
print (df)
name bar foo
DayOfWeek HourOfDay
Mon 0 0.20 0.70
1 0.11 0.20
2 NaN 0.11
3 NaN 0.45
23 NaN 0.20
Sun 23 NaN 0.10
Tue 0 NaN 0.10
1 NaN 0.60
2 NaN 0.20
但是,如果重复项需要pivot_table
一些聚合函数,例如mean
,sum
......:
print (df)
name ratio DayOfWeek HourOfDay
0 foo 0.70 Mon 0 <- duplicate for same name, DayOfWeek and HourOfDay - 0.7
1 foo 0.90 Mon 0 <- duplicate for same name, DayOfWeek and HourOfDay - 0.9
2 foo 0.20 Mon 1
3 foo 0.11 Mon 2
4 foo 0.45 Mon 3
5 foo 0.20 Mon 23
6 foo 0.10 Tue 0
7 foo 0.60 Tue 1
8 foo 0.20 Tue 2
9 foo 0.10 Sun 23
10 bar 0.20 Mon 0
11 bar 0.11 Mon 1
df = df.pivot_table(index=['DayOfWeek','HourOfDay'],
columns='name',
values='ratio',
aggfunc='mean')
print (df)
name bar foo
DayOfWeek HourOfDay
Mon 0 0.20 0.80 < (0.7 + 0.9)/2 = 0.8
1 0.11 0.20
2 NaN 0.11
3 NaN 0.45
23 NaN 0.20
Sun 23 NaN 0.10
Tue 0 NaN 0.10
1 NaN 0.60
2 NaN 0.20
替代groupby
:
df = df.groupby(['DayOfWeek','HourOfDay','name'])['ratio'].mean().unstack()
print (df)
name bar foo
DayOfWeek HourOfDay
Mon 0 0.20 0.80 < (0.7 + 0.9)/2 = 0.8
1 0.11 0.20
2 NaN 0.11
3 NaN 0.45
23 NaN 0.20
Sun 23 NaN 0.10
Tue 0 NaN 0.10
1 NaN 0.60
2 NaN 0.20