用于非常大的矩阵的热图,包括NaN

时间:2016-03-27 13:16:06

标签: python heatmap large-data

我试图看看NaN是否集中在某处,或者是否存在任何分布模式。

想法是使用python绘制矩阵的热图(200K行和1k列)并为NaN值设置特殊颜色(其余值可以用相同的颜色表示,这不是&# 39; t))

可能的显示示例: A proposition for example

提前谢谢大家

3 个答案:

答案 0 :(得分:2)

1:200 宽高比非常糟糕,因为你可能会遇到内存问题,你应该把它分解成几个 Nx1k 部分。

话虽如此,这是我的解决方案(灵感来自你的示例图片):

from mpl_toolkits.axes_grid1 import AxesGrid

# generate random matrix
xDim = 2000
yDim = 4000
# number of nans
nNans = xDim*yDim*.1
rands = np.random.rand(yDim, xDim)

# create a skewed distribution for the nans
x = np.clip(np.random.gamma(2, yDim*.125, size=nNans).astype(np.int),0 ,yDim-1)
y = np.random.randint(0,xDim,size=nNans)
rands[x,y] = np.nan

# find the nans:
isNan = np.isnan(rands)

fig = plt.figure()

# make axesgrid so we can put a histogram-like plot next to the data
grid = AxesGrid(fig, 111, nrows_ncols=(1, 2), axes_pad=0.05)

# plot the data using binary colormap
grid[0].imshow(isNan, cmap=cm.binary)

# plot the histogram
grid[1].plot(np.sum(isNan,axis=1), range(isNan.shape[0]))

# set ticks and limits, so the figure looks nice
grid[0].set_xticks([0,250,500,750,1000,1250,1500,1750])
grid[1].set_xticks([0,250,500,750])
grid[1].set_xlim([0,750])
grid.axes_llc.set_ylim([0, yDim])
plt.show()

这是它的样子:

Figure produced by the code

答案 1 :(得分:0)

# Learn about API authentication here: https://plot.ly/python/getting-started
# Find your api_key here: https://plot.ly/settings/api

import plotly.plotly as py
import plotly.graph_objs as go

data = [
    go.Heatmap(
        z=[[1, 20, 30],
        [20, 1, 60],
        [30, 60, 1]]
    )
]
plot_url = py.plot(data, filename='basic-heatm

soruce:https://plot.ly/python/heatmaps/

答案 2 :(得分:0)

你可以做的是使用散点图:

import matplotlib.pyplot as plt
import numpy as np
# create a matrix with random numbers
A = np.random.rand(2000,10)
# make some NaNs in it:
for _ in range(1000):
    i = np.random.randint(0,2000)
    j = np.random.randint(0,10)
    A[i,j] = np.nan
# get a matrix to plot with only the NaNs:
B = np.isnan(A)
# if NaN plot a point. 
for i in range(2000):
    for j in range(10):
        if B[i,j]: plt.scatter(i,j)
plt.show()

当使用python 2.6或2.7时,请考虑使用xrange而不是range来加速。

enter image description here

请注意。它可能会更快:

C = np.where(B)
plt.scatter(C[0],C[1])