4D数据的Python散点图

时间:2014-07-08 09:01:20

标签: python matplotlib

我有4D数据数据,我想散点图。对于两个附加参数的每对值,数据可以看作x和y坐标。

我想将绘图“展平”为2D散点图,其中两个额外参数由不同颜色表示,例如,两个参数的每对颜色。或者,我希望仅为少数参数对绘制的点看起来较轻,而为许多参数对绘制的点看起来更重/更暗。也许这可以通过将一些半透明的点“堆叠”在彼此之上来实现?

是否有一些标准方法可以在Python中执行此操作,例如使用matplotlib

3 个答案:

答案 0 :(得分:0)

我尝试了我建议的"堆叠"半透明的散点图相互叠加:

import numpy as np
import matplotlib.pyplot as plt

for ii in xrange(len(param1)):
    for jj in xrange(len(param2)):
        delta_idx, rho_idx = np.where(data1[:,:,ii,jj] < data2[:,:,ii,jj])
        plt.scatter(delta_idx, rho_idx, marker = 'o', c = 'k', alpha = 0.01)
plt.xlabel('$\delta$')
plt.ylabel('$\rho$')
plt.show()

我在问题中描述的二维点实际上是data1中的值小于data2中的对应值的位置的标识。这产生了以下情节:Stacked scatter plot

可以做很多好事 - 如果情节好,但我对它看起来的样子并不是很满意所以我尝试了另一个approach。无论如何我都会在这里发布,以防有​​人发现它有用。

答案 1 :(得分:0)

作为the "stacked" scatter plot的替代方案,我尝试首先在2D“出现地图”中累积data1 < data2的出现次数。然后,我使用pcolormesh(从prettyplotlib导入,以使其看起来更好)来绘制此地图:

import prettyplotlib as ppl
import numpy as np

occurrence_map = np.sum(data1 < data2, axis=(2,3), dtype=float) / np.prod(data1.shape[2:])
ppl.pcolormesh(occurrence_map2, vmin=0, vmax=1)

归一化是为了产生出现的相对度量,即参数对的一小部分(data1data2的最后两个维度)有多大data1 < data2 ?然后将绘图配置为颜色值,范围从0到1.这将生成以下图表,我对此更为满意:

pcolormesh plot of relative occurences

答案 2 :(得分:0)

关于散点图矩阵的评论激发了我尝试类似的东西。 Scatterplot矩阵并不是我想要的,但是我从@ lbn-plus-1建议的@tisimst's answer中获取了代码,并对其进行了一些修改,如下所示:

import itertools
import numpy as np
import matplotlib.pyplot as plt

def scatterplot_matrix(data, names=[], **kwargs):
    """Plots a pcolormesh matrix of subplots.  The two first dimensions of
    data are plotted as a mesh of values, one for each of the two last
    dimensions of data. Data must thus be four-dimensional and results
    in a matrix of pcolormesh plots with the number of rows equal to
    the size of the third dimension of data and number of columns
    equal to the size of the fourth dimension of data. Additional
    keyword arguments are passed on to matplotlib\'s \"pcolormesh\"
    command. Returns the matplotlib figure object containg the subplot
    grid.
    """
    assert data.ndim == 4, 'data must be 4-dimensional.'
    datashape = data.shape
    fig, axes = plt.subplots(nrows=datashape[2], ncols=datashape[3], figsize=(8,8))
    fig.subplots_adjust(hspace=0.0, wspace=0.0)

    for ax in axes.flat:
        # Hide all ticks and labels
        ax.xaxis.set_visible(False)
        ax.yaxis.set_visible(False)

        # Set up ticks only on one side for the "edge" subplots...
        if ax.is_first_col():
            ax.yaxis.set_ticks_position('left')
        if ax.is_last_col():
            ax.yaxis.set_ticks_position('right')
        if ax.is_first_row():
            ax.xaxis.set_ticks_position('top')
        if ax.is_last_row():
            ax.xaxis.set_ticks_position('bottom')

    # Plot the data.
    for ii in xrange(datashape[2]):
        for jj in xrange(datashape[3]):
            axes[ii,jj].pcolormesh(data[:,:,ii,jj], **kwargs)

    # Label the diagonal subplots...
    #if not names:
    #    names = ['x'+str(i) for i in range(numvars)]
    # 
    #for i, label in enumerate(names):
    #    axes[i,i].annotate(label, (0.5, 0.5), xycoords='axes fraction',
    #            ha='center', va='center')

    # Turn on the proper x or y axes ticks.
    #for i, j in zip(range(numvars), itertools.cycle((-1, 0))):
    #    axes[j,i].xaxis.set_visible(True)
    #    axes[i,j].yaxis.set_visible(True)

    # FIX #2: if numvars is odd, the bottom right corner plot doesn't have the
    # correct axes limits, so we pull them from other axes
    #if numvars%2:
    #    xlimits = axes[0,-1].get_xlim()
    #    ylimits = axes[-1,0].get_ylim()
    #    axes[-1,-1].set_xlim(xlimits)
    #    axes[-1,-1].set_ylim(ylimits)

    return fig

if __name__=='__main__':
    np.random.seed(1977)
    data = np.random.random([10] * 4)
    fig = scatterplot_matrix(data,
            linestyle='none', marker='o', color='black', mfc='none')
    fig.suptitle('Simple Scatterplot Matrix')
    plt.show()

我将上述模块保存为datamatrix.py并按如下方式使用:

import datamatrix
import brewer2mpl

colors = brewer2mpl.get_map('RdBu', 'Diverging', 11).mpl_colormap
indicator = np.ma.masked_invalid(-np.sign(data1 - data2)) # Negated because the 'RdBu' colormap is the wrong way around
fig = datamatrix.scatterplot_matrix(indicator, cmap = colors)
plt.show()

可以省略brewer2mpl和彩色地图的东西 - 这只是我正在玩弄的一些颜色。结果如下:

matrix of pcolormesh plots of occurrences for individual parameter values

矩阵的“外部”维度是两个参数(data1data2的最后两个维度)。然后,矩阵内的每个pmeshcolor图都是“发生图”,有点类似于this answer中的图,但每个参数对都是二进制图。一些图底部的白线是相等的区域。每个右上角的白点都是数据中的nan值。