我试图看看NaN是否集中在某处,或者是否存在任何分布模式。
想法是使用python绘制矩阵的热图(200K行和1k列)并为NaN值设置特殊颜色(其余值可以用相同的颜色表示,这不是&# 39; t))
提前谢谢大家
答案 0 :(得分:2)
1:200 宽高比非常糟糕,因为你可能会遇到内存问题,你应该把它分解成几个 Nx1k 部分。
话虽如此,这是我的解决方案(灵感来自你的示例图片):
from mpl_toolkits.axes_grid1 import AxesGrid
# generate random matrix
xDim = 2000
yDim = 4000
# number of nans
nNans = xDim*yDim*.1
rands = np.random.rand(yDim, xDim)
# create a skewed distribution for the nans
x = np.clip(np.random.gamma(2, yDim*.125, size=nNans).astype(np.int),0 ,yDim-1)
y = np.random.randint(0,xDim,size=nNans)
rands[x,y] = np.nan
# find the nans:
isNan = np.isnan(rands)
fig = plt.figure()
# make axesgrid so we can put a histogram-like plot next to the data
grid = AxesGrid(fig, 111, nrows_ncols=(1, 2), axes_pad=0.05)
# plot the data using binary colormap
grid[0].imshow(isNan, cmap=cm.binary)
# plot the histogram
grid[1].plot(np.sum(isNan,axis=1), range(isNan.shape[0]))
# set ticks and limits, so the figure looks nice
grid[0].set_xticks([0,250,500,750,1000,1250,1500,1750])
grid[1].set_xticks([0,250,500,750])
grid[1].set_xlim([0,750])
grid.axes_llc.set_ylim([0, yDim])
plt.show()
这是它的样子:
答案 1 :(得分:0)
# Learn about API authentication here: https://plot.ly/python/getting-started
# Find your api_key here: https://plot.ly/settings/api
import plotly.plotly as py
import plotly.graph_objs as go
data = [
go.Heatmap(
z=[[1, 20, 30],
[20, 1, 60],
[30, 60, 1]]
)
]
plot_url = py.plot(data, filename='basic-heatm
答案 2 :(得分:0)
你可以做的是使用散点图:
import matplotlib.pyplot as plt
import numpy as np
# create a matrix with random numbers
A = np.random.rand(2000,10)
# make some NaNs in it:
for _ in range(1000):
i = np.random.randint(0,2000)
j = np.random.randint(0,10)
A[i,j] = np.nan
# get a matrix to plot with only the NaNs:
B = np.isnan(A)
# if NaN plot a point.
for i in range(2000):
for j in range(10):
if B[i,j]: plt.scatter(i,j)
plt.show()
当使用python 2.6或2.7时,请考虑使用xrange而不是range来加速。
请注意。它可能会更快:
C = np.where(B)
plt.scatter(C[0],C[1])