我有一个类似于下面的数据框。
Index Time Weekday
0 21:10:00 Tuesday
1 21:15:00 Tuesday
2 21:20:00 Tuesday
3 21:20:00 Tuesday
4 21:25:00 Wednesday
5 21:25:00 Wednesday
6 21:30:00 Friday
7 21:35:00 Thursday
8 21:35:00 Wednesday
9 21:40:00 Wednesday
10 21:40:00 Wednesday
11 21:40:00 Monday
我想将工作日列入专栏,并计算每一天每次出现的次数,我的目标是:
Time Monday Tuesday Wednesday Thursday Friday
21:10:00 0 1 0 0 0
21:15:00 0 1 0 0 0
21:20:00 0 2 0 0 0
21:25:00 0 0 2 0 0
21:30:00 0 0 0 0 1
21:35:00 0 0 1 1 0
21:40:00 1 0 2 0 0
原因是因为我想在seaborn中创建一个热图,我读取的数据必须以某种方式进行旋转/成形: https://stackoverflow.com/a/37790707/9384889
我知道如何计算每个Time
值出现的频率,忽略工作日:
df['Time'].value_counts()
我一直在阅读http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot.html
但我看不出如何将这两种想法结合起来。
答案 0 :(得分:2)
使用groupby
与size
和unstack
或crosstab
替代方案进行重塑。
对于天的变更单,需要ordered Categorical
或reindex
列:
cats = ['Monday','Tuesday','Wednesday','Thursday','Friday']
df['Weekday'] = pd.Categorical(df['Weekday'], categories=cats, ordered=True)
df = df.groupby(['Time', 'Weekday']).size().unstack(fill_value=0)
df = df.groupby(['Time', 'Weekday']).size().unstack(fill_value=0).reindex(columns=cats)
备选方案:
df = pd.crosstab(df['Time'], pd.Categorical(df['Weekday'], categories=cats, ordered=True))
df = pd.crosstab(df['Time'], df['Weekday']).reindex(columns=cats)
print (df)
col_0 Monday Tuesday Wednesday Thursday Friday
Time
21:10:00 0 1 0 0 0
21:15:00 0 1 0 0 0
21:20:00 0 2 0 0 0
21:25:00 0 0 2 0 0
21:30:00 0 0 0 0 1
21:35:00 0 0 1 1 0
21:40:00 1 0 2 0 0
上次使用seaborn.heatmap
:
import seaborn as sns
sns.heatmap(df, annot=True, fmt="g", cmap='viridis')