我正在尝试创建一个图形,其中多维数据集的每个维度都在子图网格中相互绘制。以下是我到目前为止的情况:
x维度由子图列确定,y维度由行确定。当尺寸相等时,绘制具有y轴密度的1-d直方图,否则使用具有映射到颜色的密度的2d直方图。在创建每个子图时,我与该列中的第一个图共享x轴(使用sharex
函数中的Figure.add_subplot
参数)。除了1d直方图外,Y轴的分享方式相似。
这样可以使轴保持相同的比例,但是您可以在左上方看到问题。由于大多数轴在行和列中是相同的,因此在图的底部和左侧部分仅有刻度线。问题是左上方的子图具有与其行的其余部分不同的y比例。
我想实际上对行上其他子图的y轴进行刻度,应用于左上角的子图,不用更改该子图的限制。从行中的第二个子图获得y标签并在第一个作品中设置它们,但实际上改变刻度的位置并不是因为轴的极限不相同。除了明确地将点从一个绘图的比例转换到另一个绘图的比例之外,我无法确定如何相对设置刻度位置。
编辑:因为有人问,这里是用于生成此代码的基本版本代码:
import numpy as np
from scipy.stats import gaussian_kde
def matrix_plot(figure, data, limits, labels):
"""
Args:
figure: matplotlib Figure
data: numpy.ndarray, points/observations in rows
limits: list of (min, max) values for axis limits
labels: list of labels for each dimension
"""
# Number of dimensions (data columns)
ndim = data.shape[1]
# Create KDE objects
density = [ gaussian_kde(data[:,dim]) for dim in range(ndim) ]
# Keep track of subplots
plots = np.ndarray((ndim, ndim), dtype=object)
# Loop through dimensions twice
# dim1 goes by column
for dim1 in range(ndim):
# dim2 goes by row
for dim2 in range(ndim):
# Index of plot
i = dim2 * ndim + dim1 + 1
# Share x-axis with plot at top of column
# Share y-axis with plot at beginning of row, unless that
# plot or current plot is a 1d plot
kwargs = dict()
if dim2 > 0:
kwargs['sharex'] = plots[0][dim1]
if dim1 > 0 and dim1 != dim2:
kwargs['sharey'] = plots[dim2][0]
elif dim1 > 1:
kwargs['sharey'] = plots[dim2][1]
# Create new subplot
# Pass in shared axis arguments with **kwargs
plot = figure.add_subplot(ndim, ndim, i, **kwargs)
plots[dim2][dim1] = plot
# 1d density plot
if dim1 == dim2:
# Space to plot over
x = np.linspace(limits[dim][0], limits[dim][1], 100)
# Plot filled region
plot.set_xlim(limits[dim])
plot.fill_between(x, density[dim].evaluate(x))
# 2d density plot
else:
# Make histogram
h, xedges, yedges = np.histogram2d(data[:,dim1],
data[:,dim2], range=[limits[dim1], limits[dim2]],
bins=250)
# Set zero bins to NaN to make empty regions of
# plot transparent
h[h == 0] = np.nan
# Plot without grid
plot.imshow(h.T, origin='lower',
extent=np.concatenate((limits[dim1], limits[dim2])),
aspect='auto')
plot.grid(False)
# Ticks and labels of except on figure edges
plot.tick_params(axis='both', which='both', left='off',
right='off', bottom='off', top='off', labelleft='off',
labelbottom='off')
if dim1 == 0:
plot.tick_params(axis='y', left='on', labelleft='on')
plot.set_ylabel(labels[dim2])
if dim2 == self._ndim - 1:
plot.tick_params(axis='x', bottom='on', labelbottom='on')
plot.set_xlabel(labels[dim1])
# Tight layout
figure.tight_layout(pad=.1, h_pad=0, w_pad=0)
当我尝试将第一行中第二个图的y轴上的刻度位置和标签复制到第一个图时,我得到的是:
plots[0][0].set_yticks(plots[0][1].get_yticks())
plots[0][0].set_yticklabels(plots[0][1].get_yticklabels())
注意它如何在绝对标度上指定刻度位置,该绝对标度远高于密度图的标度。轴限制扩展以显示刻度,因此实际密度曲线被压缩到底部。此外,标签不会显示。
答案 0 :(得分:1)
感谢Ajean的评论,告诉我scatter_matrix
包中的pandas
功能,这或多或少地影响了我在这里尝试做的事情。我在GitHub上查看了源代码,找到了他们修复的部分"左上图中的轴对应于行的共享y轴而不是密度轴:
if len(df.columns) > 1:
lim1 = boundaries_list[0]
locs = axes[0][1].yaxis.get_majorticklocs()
locs = locs[(lim1[0] <= locs) & (locs <= lim1[1])]
adj = (locs - lim1[0]) / (lim1[1] - lim1[0])
lim0 = axes[0][0].get_ylim()
adj = adj * (lim0[1] - lim0[0]) + lim0[0]
axes[0][0].yaxis.set_ticks(adj)
if np.all(locs == locs.astype(int)):
# if all ticks are int
locs = locs.astype(int)
axes[0][0].yaxis.set_ticklabels(locs)
不幸的是,它看起来像我害怕的:除了手动将刻度位置从一个范围转换到另一个范围之外,没有更优雅的方法。这是我的版本,它紧跟在双循环之后:
# Check there are more plots in the row, just in case
if ndim > 1:
# Get tick locations from 2nd plot in first row
ticks = np.asarray(plots[0][1].yaxis.get_majorticklocs())
# Throw out the ones that aren't within the limit
# (Copied from pandas code, but probably not necessary)
ticks = ticks[(ticks >= limits[0][0]) & (ticks <= limits[0][1])]
# Scale ticks to range of [0, 1] (relative to axis limits)
ticks_scaled = (ticks - limits[0][0]) / (limits[0][1] - limits[0][0])
# Y limits of top-left density plot (was automatically determined
# by matplotlib)
dlim = plots[0][0].get_ylim()
# Set the ticks scaled to the plot's own y-axis
plots[0][0].set_yticks((ticks_scaled * (dlim[1] - dlim[0])) + dlim[0])
# Set tick labels to their original positions on the 2d plot
plots[0][0].set_yticklabels(ticks)
这得到了我正在寻找的结果。