热图未显示所有行

时间:2020-04-26 20:08:38

标签: python pandas seaborn

我有一个包含399行(Words)和5列(Dates)的数据集。我想通过热图可视化一些信息。我使用以下方法创建了数据透视表:

pd.pivot_table(df, index='Words', columns='Date', values='frequency', aggfunc=np.sum)

输出:

Date    2018-02-18  2018-02-19  2018-02-20  2018-02-21  2018-02-22
Words                   
A   NaN NaN NaN 2.0 2.0
B   NaN NaN NaN NaN 1.0
C   NaN NaN NaN NaN 1.0
D   NaN 1.0 NaN NaN NaN
E   NaN NaN 1.0 NaN NaN
... ... ... ... ... ...
RRR NaN 10.0    NaN NaN 90.0
SSS NaN 3.0 3.0 3.0 NaN
TTT NaN 4.0 NaN NaN NaN
UUU NaN NaN NaN 1.0 NaN
VVV NaN NaN NaN NaN 1.0
ZZZ NaN NaN 1.0 NaN 1.0

399 rows × 5 columns

然后我尝试使用以下代码行创建热图:

piv = pd.pivot_table(df, values="frequency",index=["Words"], columns=["Date"], fill_value=0)
ax = sns.heatmap(piv, square=False)

但是,输出只显示这399行中的20行。是否可以可视化热图中的所有行?万一不可能,我该如何仅可视化最受欢迎的行(即在时间/日期中具有较高频率的行)?

您的帮助将不胜感激。谢谢。

1 个答案:

答案 0 :(得分:0)

您的输出确实显示了所有行,但是y标签减少了,因为它们会重叠太多并且不可读。

如果您没有频率列,则可以创建一个频率列,并使用1将所有值设置为df['frequency'] = 1。然后,聚合函数将汇总所有内容。

您可以对piv数据帧进行排序,并使用idx = piv.sum(axis=1).sort_values(ascending=False).head(10).index取10个最大值。然后,piv.loc[idx]将仅按该顺序获得那些行。

下面的代码显示了步骤。在特定情况下,它还会旋转刻度线标签,以使其更易于阅读。

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

N = 1000
abc = list('ABCDEFGHIJKLMNOPQRS')
df = pd.DataFrame({'Date':[f'2018-02-{i:02d}' for i in np.random.randint(18, 23, N)],
                   'Words': [abc[i]+abc[j] for i,j  in np.random.randint(0, len(abc), (N, 2)) ] ,
                   'frequency': np.random.randint(1, 10, N)
                   })
# df['frequency'] = 1  # in case there wasn't a frequency column yet
piv = pd.pivot_table(df, values="frequency",index=["Words"], columns=["Date"], fill_value=0, aggfunc=np.sum)
idx = piv.sum(axis=1).sort_values(ascending=False).head(10).index
ax = sns.heatmap(piv.loc[idx], square=False)
ax.set_xticklabels(ax.get_xticklabels(), rotation=0) # rotate the x labels to be horizontally again
ax.set_yticklabels(ax.get_yticklabels(), rotation=0) # rotate the y labels to be horizontally
plt.show()

sample plot

PS:要显示所有刻度,并且所有标签(它们可能太拥挤)按字母顺序排序:

from matplotlib.ticker import FixedLocator

idx = piv.sort_values('Words', ascending=True).index
ax = sns.heatmap(piv.loc[idx], square=False)
ax.yaxis.set_major_locator(FixedLocator(np.arange(0.5, len(idx) + 0.5, 1)))
ax.set_yticklabels(idx, rotation=0, fontsize=6)

或者要查看标签左右交替(以适应双标签),辅助轴可能会有所帮助:

ax.yaxis.set_major_locator(FixedLocator(np.arange(0.5, len(idx) + 0.5, 2)))
ax.set_yticklabels(idx[::2], rotation=0, fontsize=6)

secax = ax.secondary_yaxis('right')
secax.yaxis.set_major_locator(FixedLocator(np.arange(1.5, len(idx) + 0.5, 2)))
secax.set_yticklabels(idx[1::2], rotation=0, fontsize=6)
相关问题