python散景,如何制作相关图?

时间:2016-08-28 13:15:50

标签: python heatmap bokeh

如何在Bokeh中制作相关热图?

import pandas as pd
import bokeh.charts

df = pd.util.testing.makeTimeDataFrame(1000)
c = df.corr()

p = bokeh.charts.HeatMap(c) # not right

# try to make it a long form
# (and it's ugly in pandas to use 'index' in melt)

c['x'] = c.index
c = pd.melt(c, 'x', ['A','B','C','D'])

# this shows the right 4x4 matrix, but values are still wrong
p = bokeh.charts.HeatMap(c, x = 'x', y = 'variable', values = 'value') 

顺便说一下,我可以在侧面制作一个颜色条,而不是在情节中传说吗?还有如何选择颜色范围/映射,例如深蓝色(-1)到白色(0)到深红色(+1)?

3 个答案:

答案 0 :(得分:4)

如果您想要这种控制级别,我必须建议使用(仅略微)较低级别bokeh.plotting interface。您可以在库中看到使用此界面生成的分类热图的示例:

http://bokeh.pydata.org/en/latest/docs/gallery/categorical.html

关于图例,对于像这样的色彩图,您实际上需要一个离散的ColorBar而不是Legend。这是一项新功能,将在本周晚些时候发布的0.12.2 (今天' s日期:2016-08-28)中发布。这些新的colorbar注释可以位于主绘图区域之外。目前,要查看相关文档,您必须参考" dev预览" docs site:

http://bokeh.pydata.org/en/dev/docs/user_guide/annotations.html#color-bars

GitHub回购中还有一个例子:

https://github.com/bokeh/bokeh/blob/master/examples/plotting/file/color_data_map.py

请注意,最后一个示例还使用了另一个新功能在浏览器中进行颜色映射,而不必在python中预先计算颜色。基本上它们一起看起来像:

# create a color mapper with your palette - can be any list of colors
mapper = LinearColorMapper(palette=Viridis3, low=0, high=100)

p = figure(toolbar_location=None, tools='', title=title)
p.circle(
    x='x', y='y', source=source

    # use the mapper to colormap according to the 'z' column (in the browser)
    fill_color={'field': 'z', 'transform': mapper},  
)

# create a ColorBar and addit to the side of the plot
color_bar = ColorBar(color_mapper=mapper, location=(0, 0))
p.add_layout(color_bar, 'right')

还有更复杂的选项,例如如果你想更仔细地控制颜色条上的滴答,你可以添加一个自定义的自动收报机或刻度格式化器,就像在普通的Axis上一样,以实现以下目的:

enter image description here

目前尚不清楚您的实际要求是什么,所以我只是提到这一点,以防它有用。

0.12.2发布之前,您可以通过安装" dev build"来使用这些新功能。或释放候选人。主要文档网站有installing developer builds的简单说明。

最后,Bokeh是一个大项目,找到最好的方法往往需要更多的信息和背景,一般来说,进行讨论。这种协作帮助似乎在SO上不受欢迎(他们是#34;不是真正的答案")所以我鼓励你随时查看public mailing list寻求帮助。

答案 1 :(得分:3)

So I think I can provide a baseline code to help do what you are asking using a combination of the answers above and some extra pre-processing.

Let's assume you have a dataframe df already loaded (in this case the UCI Adult Data) and the correlation coefficients calculated (p_corr).

import bisect
#
from math import pi
from numpy import arange
from itertools import chain
from collections import OrderedDict
#
from bokeh.palettes import RdBu as colors  # just make sure to import a palette that centers on white (-ish)
from bokeh.models import ColorBar, LinearColorMapper

colors = list(reversed(colors[9]))  # we want an odd number to ensure 0 correlation is a distinct color
labels = df.columns
nlabels = len(labels)

def get_bounds(n):
    """Gets bounds for quads with n features"""
    bottom = list(chain.from_iterable([[ii]*nlabels for ii in range(nlabels)]))
    top = list(chain.from_iterable([[ii+1]*nlabels for ii in range(nlabels)]))
    left = list(chain.from_iterable([list(range(nlabels)) for ii in range(nlabels)]))
    right = list(chain.from_iterable([list(range(1,nlabels+1)) for ii in range(nlabels)]))
    return top, bottom, left, right

def get_colors(corr_array, colors):
    """Aligns color values from palette with the correlation coefficient values"""
    ccorr = arange(-1, 1, 1/(len(colors)/2))
    color = []
    for value in corr_array:
        ind = bisect.bisect_left(ccorr, value)
        color.append(colors[ind-1])
    return color

p = figure(plot_width=600, plot_height=600,
           x_range=(0,nlabels), y_range=(0,nlabels),
           title="Correlation Coefficient Heatmap (lighter is worse)",
           toolbar_location=None, tools='')

p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = None
p.xaxis.major_label_orientation = pi/4
p.yaxis.major_label_orientation = pi/4

top, bottom, left, right = get_bounds(nlabels)  # creates sqaures for plot
color_list = get_colors(p_corr.values.flatten(), colors)

p.quad(top=top, bottom=bottom, left=left,
       right=right, line_color='white',
       color=color_list)

# Set ticks with labels
ticks = [tick+0.5 for tick in list(range(nlabels))]
tick_dict = OrderedDict([[tick, labels[ii]] for ii, tick in enumerate(ticks)])
# Create the correct number of ticks for each axis 
p.xaxis.ticker = ticks
p.yaxis.ticker = ticks
# Override the labels 
p.xaxis.major_label_overrides = tick_dict
p.yaxis.major_label_overrides = tick_dict

# Setup color bar
mapper = LinearColorMapper(palette=colors, low=-1, high=1)
color_bar = ColorBar(color_mapper=mapper, location=(0, 0))
p.add_layout(color_bar, 'right')

show(p)

This will result in the following plot if the categories are integer encoded (this is a horrible data example):

Pearson Correlation Coefficient Heatmap in Bokeh

答案 2 :(得分:2)

我尝试使用Bokeh库创建交互式关联图。该代码是SO和其他网站上提供的不同解决方案的组合。在上面的解决方案中,bigreddot详细解释了一些事情。相关热图的代码如下:

import pandas as pd
from bokeh.io import output_file, show
from bokeh.models import BasicTicker, ColorBar, LinearColorMapper, ColumnDataSource, PrintfTickFormatter
from bokeh.plotting import figure
from bokeh.transform import transform
from bokeh.palettes import Viridis3, Viridis256
# Read your data in pandas dataframe
data = pd.read_csv(%%%%%Your Path%%%%%)
#Now we will create correlation matrix using pandas
df = data.corr()

df.index.name = 'AllColumns1'
df.columns.name = 'AllColumns2'

# Prepare data.frame in the right format
df = df.stack().rename("value").reset_index()

# here the plot :
output_file("CorrelationPlot.html")

# You can use your own palette here
# colors = ['#d7191c', '#fdae61', '#ffffbf', '#a6d96a', '#1a9641']

# I am using 'Viridis256' to map colors with value, change it with 'colors' if you need some specific colors
mapper = LinearColorMapper(
    palette=Viridis256, low=df.value.min(), high=df.value.max())

# Define a figure and tools
TOOLS = "box_select,lasso_select,pan,wheel_zoom,box_zoom,reset,help"
p = figure(
    tools=TOOLS,
    plot_width=1200,
    plot_height=1000,
    title="Correlation plot",
    x_range=list(df.AllColumns1.drop_duplicates()),
    y_range=list(df.AllColumns2.drop_duplicates()),
    toolbar_location="right",
    x_axis_location="below")

# Create rectangle for heatmap
p.rect(
    x="AllColumns1",
    y="AllColumns2",
    width=1,
    height=1,
    source=ColumnDataSource(df),
    line_color=None,
    fill_color=transform('value', mapper))

# Add legend
color_bar = ColorBar(
    color_mapper=mapper,
    location=(0, 0),
    ticker=BasicTicker(desired_num_ticks=10))

p.add_layout(color_bar, 'right')

show(p)

参考文献:

[1] https://bokeh.pydata.org/en/latest/docs/user_guide.html

[2] Bokeh heatmap from Pandas confusion matrix