Question

我正在使用Pandas绘制散点图矩阵，但第一个绘图的刻度标签有时会正确绘制，有时会错误地绘制。我无法弄清楚出了什么问题！

以下是一个例子：

enter image description here

代码：

from pandas.tools.plotting import scatter_matrix
import pylab
import numpy as np
import pandas as pd

def create_scatterplot_matix(X, name):    
    """
    Outputs a scatterplot matrix for a design matrix.

    Parameters:
    -----------
    X:a design matrix where each column is a feature and each row is an observation.
    name: the name of the plot.
    """
    pylab.figure()
    df = pd.DataFrame(X)
    axs = scatter_matrix(df, alpha=0.2, diagonal='kde')

    for ax in axs[:,0]: # the left boundary
        ax.grid('off', axis='both')
        ax.set_yticks([0, .5])

    for ax in axs[-1,:]: # the lower boundary
        ax.grid('off', axis='both')
        ax.set_xticks([0, .5])

    pylab.savefig(name + ".png")

伙计们，有人吗？!!

编辑（X的例子）：

X = np.random.randn(1000000, 10)

Answer 1

这是预期的行为。 y轴值显示第0列的y轴值。第0行，第0列包含概率密度图。第0行，第1至第3列包含用于在对角线上创建图形的数据。

Pandas Plotting文档中的example看起来很相似。

演示：

from pandas.tools.plotting import scatter_matrix
import pylab
import numpy as np
import pandas as pd

def create_scatterplot_matix(X, name):    
    pylab.figure()

    df = pd.DataFrame(X)
    axs = scatter_matrix(df, alpha=0.2, diagonal='kde')

    pylab.savefig(name + ".png")

create_scatterplot_matix([[0,0,0,0]
                         ,[1,1,1,1]
                         ,[1,1,1,1]
                         ,[2,2,2,2]],'test')

在这个示例代码中，我使用了一个非常简单的数据集来进行演示。我还删除了设置y和x刻度的代码部分。

这是结果图：

enter image description here

在每个对角线中是概率密度图。在每个非对角线中是用于在对角线中创建图形的数据。第0行的y轴表示位于第0位的概率密度图的y轴。第1行，第2行和第3行的y轴显示了用于在对角线上创建概率密度图的0,1 0,2和0,3位置的数据的y轴。

您可以在我们的示例中看到以下绘制点：[0,0] [1,1] [2,2]。 [1,1]处的点较暗，因为此位置的点数多于其他点。

正在发生的是您的数据集，所有值都在0到1之间，这就是为什么0.5在两个轴上完全显示在行/列的中心的原因。然而，数据严重偏向0值，这就是为什么概率密度图会越接近0，第0行中概率密度图的最大值看起来像是（眼球测试）大约8。 -10

我个人所做的就是将你的左边界代码编辑成这样的东西：

autoscale = True # We want the 0,0th item's y-axis to autoscale
for ax in axs[:,0]: # the left boundary
    ax.grid('off', axis='both')
    if autoscale == True:     
        ax.set_autoscale_on(True)
        autoscale = False
    else:
        ax.set_yticks([0, 0.5])

对于我们的示例数据集，使用此技术会生成如下图表：

enter image description here

Answer 2

这似乎是熊猫的一个错误。见https://github.com/pydata/pandas/issues/5662

在此期间，您可以手动调整标签。首先，根据核密度图中的范围设置标签的数量和所需的间隔。

axs[0,0].set_yticks([0.24,0.33,0.42])

然后手动更改标签中的文字。

axs[0,0].set_yticklabels([0.0, 1.0, 2.0])

带有Pandas的散点图中的刻度标签未正确绘制

2 个答案: