从pandas DataFrame制作热图

时间:2012-09-05 17:18:21

标签: python pandas dataframe heatmap

我有一个从Python的Pandas包生成的数据框。如何使用pandas包中的DataFrame生成热图。

import numpy as np 
from pandas import *

Index= ['aaa','bbb','ccc','ddd','eee']
Cols = ['A', 'B', 'C','D']
df = DataFrame(abs(np.random.randn(5, 4)), index= Index, columns=Cols)

>>> df
          A         B         C         D
aaa  2.431645  1.248688  0.267648  0.613826
bbb  0.809296  1.671020  1.564420  0.347662
ccc  1.501939  1.126518  0.702019  1.596048
ddd  0.137160  0.147368  1.504663  0.202822
eee  0.134540  3.708104  0.309097  1.641090
>>> 

7 个答案:

答案 0 :(得分:132)

对于今天看到这个问题的人,我会建议使用Seaborn heatmap()作为记录here

以上示例将按如下方式完成:

import numpy as np 
from pandas import DataFrame
import seaborn as sns
%matplotlib inline

Index= ['aaa', 'bbb', 'ccc', 'ddd', 'eee']
Cols = ['A', 'B', 'C', 'D']
df = DataFrame(abs(np.random.randn(5, 4)), index=Index, columns=Cols)

sns.heatmap(df, annot=True)

%matplotlib对于那些不熟悉的人来说是一个IPython魔术函数。

答案 1 :(得分:63)

您想要matplotlib.pcolor

import numpy as np 
from pandas import DataFrame
import matplotlib.pyplot as plt

Index= ['aaa', 'bbb', 'ccc', 'ddd', 'eee']
Cols = ['A', 'B', 'C', 'D']
df = DataFrame(abs(np.random.randn(5, 4)), index=Index, columns=Cols)

plt.pcolor(df)
plt.yticks(np.arange(0.5, len(df.index), 1), df.index)
plt.xticks(np.arange(0.5, len(df.columns), 1), df.columns)
plt.show()

答案 2 :(得分:30)

如果您不需要每个说法的情节,并且您只想添加颜色来表示表格格式的值,则可以使用pandas数据框的style.background_gradient()方法。此方法着色在查看例如pandas数据帧时显示的HTML表格。 JupyterLab Notebook和结果类似于在电子表格软件中使用“条件格式”:

import numpy as np 
import pandas as pd


index= ['aaa', 'bbb', 'ccc', 'ddd', 'eee']
cols = ['A', 'B', 'C', 'D']
df = pd.DataFrame(abs(np.random.randn(5, 4)), index=index, columns=cols)
df.style.background_gradient(cmap='summer')

enter image description here

我提供了much more detailed answer on the same topic previouslystyling section of the pandas documentation深入介绍了许多选项。

答案 3 :(得分:13)

有用的function CheckState() { if (selectedText == 'Estados Unidos') { $("#listStateEUA").val("Chicago"); } } api是here。查看参数,有很多它们。例如:

sns.heatmap

enter image description here

答案 4 :(得分:1)

如果您希望从Pandas DataFrame获得交互式热图,并且正在运行Jupyter笔记本,则可以尝试使用交互式小部件Clustergrammer-Widget,请参阅NBViewer here上的交互式笔记本,文档here

enter image description here

对于更大的数据集,您可以尝试开发中的Clustergrammer2 WebGL小部件(例如笔记本here

答案 5 :(得分:0)

请注意,async onFacebookPostback(turnContext, postback) { // qnaMaker.getAnswers doesn't accept string input, so we need to adjust our turnContext // to match what it expects, which is a string in Activity.Text turnContext.activity.text = postback.payload; const qnaResults = await this.qnaMaker.getAnswers(turnContext); // If an answer was received from QnA Maker, send the answer back to the user. if (qnaResults[0]) { await turnContext.sendActivity(qnaResults[0].answer); // If no answers were returned from QnA Maker, reply with help. } else { await turnContext.sendActivity('No QnA Maker answers were found.'); } } 的作者只有want seaborn使用分类数据框。这不是一般。

如果您的索引和列是数字和/或日期时间值,那么此代码将非常适合您。

Matplotlib热映射函数seaborn.heatmap需要 bins 而不是 indices ,因此有一些漂亮的代码可以从数据框索引中构建bin(即使您的索引间距不均匀!)。

其余就是pcolormeshnp.meshgrid

plt.pcolormesh

使用import pandas as pd import numpy as np import matplotlib.pyplot as plt def conv_index_to_bins(index): """Calculate bins to contain the index values. The start and end bin boundaries are linearly extrapolated from the two first and last values. The middle bin boundaries are midpoints. Example 1: [0, 1] -> [-0.5, 0.5, 1.5] Example 2: [0, 1, 4] -> [-0.5, 0.5, 2.5, 5.5] Example 3: [4, 1, 0] -> [5.5, 2.5, 0.5, -0.5]""" assert index.is_monotonic_increasing or index.is_monotonic_decreasing # the beginning and end values are guessed from first and last two start = index[0] - (index[1]-index[0])/2 end = index[-1] + (index[-1]-index[-2])/2 # the middle values are the midpoints middle = pd.DataFrame({'m1': index[:-1], 'p1': index[1:]}) middle = middle['m1'] + (middle['p1']-middle['m1'])/2 if isinstance(index, pd.DatetimeIndex): idx = pd.DatetimeIndex(middle).union([start,end]) elif isinstance(index, (pd.Float64Index,pd.RangeIndex,pd.Int64Index)): idx = pd.Float64Index(middle).union([start,end]) else: print('Warning: guessing what to do with index type %s' % type(index)) idx = pd.Float64Index(middle).union([start,end]) return idx.sort_values(ascending=index.is_monotonic_increasing) def calc_df_mesh(df): """Calculate the two-dimensional bins to hold the index and column values.""" return np.meshgrid(conv_index_to_bins(df.index), conv_index_to_bins(df.columns)) def heatmap(df): """Plot a heatmap of the dataframe values using the index and columns""" X,Y = calc_df_mesh(df) c = plt.pcolormesh(X, Y, df.values.T) plt.colorbar(c) 对其进行调用,然后使用heatmap(df)对其进行查看。

enter image description here

答案 6 :(得分:0)

令人惊讶的是,没有人提到任何一种功能更强大,更具交互性且更易于使用的替代方案。

A)您可以使用plotly:

  1. 只需两行,您就会得到:

  2. 互动性

  3. 平滑比例,

  4. 基于整个数据框而不是单个列的颜色

  5. 列名和轴上的行索引,

  6. 放大

  7. 平移,

  8. 内置一键式功能,可以将其保存为PNG格式,

  9. 自动缩放,

  10. 悬停比较,

  11. 冒泡显示值,因此热图仍然看起来不错,您可以看到 所需的值:

import plotly.express as px
fig = px.imshow(df.corr())
fig.show()

enter image description here

B)您也可以使用Bokeh:

所有相同的功能都有些麻烦。但是,如果您不想选择加入,但仍然想要所有这些东西,仍然值得:

from bokeh.plotting import figure, show, output_notebook
from bokeh.models import ColumnDataSource, LinearColorMapper
from bokeh.transform import transform
output_notebook()
colors = ['#d7191c', '#fdae61', '#ffffbf', '#a6d96a', '#1a9641']
TOOLS = "hover,save,pan,box_zoom,reset,wheel_zoom"
data = df.corr().stack().rename("value").reset_index()
p = figure(x_range=list(df.columns), y_range=list(df.index), tools=TOOLS, toolbar_location='below',
           tooltips=[('Row, Column', '@level_0 x @level_1'), ('value', '@value')], height = 500, width = 500)

p.rect(x="level_1", y="level_0", width=1, height=1,
       source=data,
       fill_color={'field': 'value', 'transform': LinearColorMapper(palette=colors, low=data.value.min(), high=data.value.max())},
       line_color=None)
color_bar = ColorBar(color_mapper=LinearColorMapper(palette=colors, low=data.value.min(), high=data.value.max()), major_label_text_font_size="7px",
                     ticker=BasicTicker(desired_num_ticks=len(colors)),
                     formatter=PrintfTickFormatter(format="%f"),
                     label_standoff=6, border_line_color=None, location=(0, 0))
p.add_layout(color_bar, 'right')

show(p)

enter image description here