Matplotlib中的平行坐标图

时间:2011-11-22 16:58:30

标签: python matplotlib parallel-coordinates

使用传统的绘图类型可以相对直观地查看二维和三维数据。即使使用四维数据,我们也经常可以找到显示数据的方法。但是,高于4的尺寸变得越来越难以显示。幸运的是,parallel coordinates plots提供了一种查看更高维度结果的机制。

Example Parallel Coordinates Plot from Wikipedia

多个绘图包提供平行坐标图,例如MatlabRVTK type 1VTK type 2,但我不知道如何使用Matplotlib创建一个。

  1. Matplotlib中是否有内置的平行坐标图?我当然看不到一个in the gallery
  2. 如果没有内置类型,是否可以使用Matplotlib的标准功能构建平行坐标图?

  3. 修改

    根据以下振亚提供的答案,我开发了以下支持任意数量轴的概括。按照我在上面原始问题中发布的示例的绘图样式,每个轴都有自己的比例。我通过对每个轴点的数据进行标准化并使轴的范围为0到1来实现这一点。然后返回并为每个刻度标记应用标签,在该截距处给出正确的值。

    该功能通过接受可迭代的数据集来工作。每个数据集被认为是一组点,其中每个点位于不同的轴上。 __main__中的示例在两组30行中抓取每个轴的随机数。线条在引起线条聚类的范围内是随机的;我想验证的行为。

    这个解决方案不如内置解决方案好,因为你有奇怪的鼠标行为,而且我通过标签伪造数据范围,但在Matplotlib添加内置解决方案之前,它是可以接受的。

    #!/usr/bin/python
    import matplotlib.pyplot as plt
    import matplotlib.ticker as ticker
    
    def parallel_coordinates(data_sets, style=None):
    
        dims = len(data_sets[0])
        x    = range(dims)
        fig, axes = plt.subplots(1, dims-1, sharey=False)
    
        if style is None:
            style = ['r-']*len(data_sets)
    
        # Calculate the limits on the data
        min_max_range = list()
        for m in zip(*data_sets):
            mn = min(m)
            mx = max(m)
            if mn == mx:
                mn -= 0.5
                mx = mn + 1.
            r  = float(mx - mn)
            min_max_range.append((mn, mx, r))
    
        # Normalize the data sets
        norm_data_sets = list()
        for ds in data_sets:
            nds = [(value - min_max_range[dimension][0]) / 
                    min_max_range[dimension][2] 
                    for dimension,value in enumerate(ds)]
            norm_data_sets.append(nds)
        data_sets = norm_data_sets
    
        # Plot the datasets on all the subplots
        for i, ax in enumerate(axes):
            for dsi, d in enumerate(data_sets):
                ax.plot(x, d, style[dsi])
            ax.set_xlim([x[i], x[i+1]])
    
        # Set the x axis ticks 
        for dimension, (axx,xx) in enumerate(zip(axes, x[:-1])):
            axx.xaxis.set_major_locator(ticker.FixedLocator([xx]))
            ticks = len(axx.get_yticklabels())
            labels = list()
            step = min_max_range[dimension][2] / (ticks - 1)
            mn   = min_max_range[dimension][0]
            for i in xrange(ticks):
                v = mn + i*step
                labels.append('%4.2f' % v)
            axx.set_yticklabels(labels)
    
    
        # Move the final axis' ticks to the right-hand side
        axx = plt.twinx(axes[-1])
        dimension += 1
        axx.xaxis.set_major_locator(ticker.FixedLocator([x[-2], x[-1]]))
        ticks = len(axx.get_yticklabels())
        step = min_max_range[dimension][2] / (ticks - 1)
        mn   = min_max_range[dimension][0]
        labels = ['%4.2f' % (mn + i*step) for i in xrange(ticks)]
        axx.set_yticklabels(labels)
    
        # Stack the subplots 
        plt.subplots_adjust(wspace=0)
    
        return plt
    
    
    if __name__ == '__main__':
        import random
        base  = [0,   0,  5,   5,  0]
        scale = [1.5, 2., 1.0, 2., 2.]
        data = [[base[x] + random.uniform(0., 1.)*scale[x]
                for x in xrange(5)] for y in xrange(30)]
        colors = ['r'] * 30
    
        base  = [3,   6,  0,   1,  3]
        scale = [1.5, 2., 2.5, 2., 2.]
        data.extend([[base[x] + random.uniform(0., 1.)*scale[x]
                     for x in xrange(5)] for y in xrange(30)])
        colors.extend(['b'] * 30)
    
        parallel_coordinates(data, style=colors).show()
    

    编辑2:

    以下是绘制Fisher's Iris data时上述代码的示例。它不如维基百科的参考图像那么好,但如果您拥有Matplotlib并且需要多维图,那么它是可以接受的。

    Example result of parallel coordinates plot from this answer

7 个答案:

答案 0 :(得分:43)

pandas有一个平行坐标包装器:

import pandas
import matplotlib.pyplot as plt
from pandas.tools.plotting import parallel_coordinates

data = pandas.read_csv(r'C:\Python27\Lib\site-packages\pandas\tests\data\iris.csv', sep=',')
parallel_coordinates(data, 'Name')
plt.show()

screenshot

源代码,他们是如何做到的:plotting.py#L494

答案 1 :(得分:14)

我确信有更好的方法可以做到这一点,但这里有一个快速而肮脏的方式(非常脏):

#!/usr/bin/python
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

#vectors to plot: 4D for this example
y1=[1,2.3,8.0,2.5]
y2=[1.5,1.7,2.2,2.9]

x=[1,2,3,8] # spines

fig,(ax,ax2,ax3) = plt.subplots(1, 3, sharey=False)

# plot the same on all the subplots
ax.plot(x,y1,'r-', x,y2,'b-')
ax2.plot(x,y1,'r-', x,y2,'b-')
ax3.plot(x,y1,'r-', x,y2,'b-')

# now zoom in each of the subplots 
ax.set_xlim([ x[0],x[1]])
ax2.set_xlim([ x[1],x[2]])
ax3.set_xlim([ x[2],x[3]])

# set the x axis ticks 
for axx,xx in zip([ax,ax2,ax3],x[:-1]):
  axx.xaxis.set_major_locator(ticker.FixedLocator([xx]))
ax3.xaxis.set_major_locator(ticker.FixedLocator([x[-2],x[-1]]))  # the last one

# EDIT: add the labels to the rightmost spine
for tick in ax3.yaxis.get_major_ticks():
  tick.label2On=True

# stack the subplots together
plt.subplots_adjust(wspace=0)

plt.show()

这基本上是基于Joe Kingon({3}}的一个(更好的)。您可能还想查看同一问题的其他答案。

在这个例子中,我甚至没有尝试缩放垂直标度,因为它取决于你想要实现的目标。

编辑:结果为enter image description here

答案 2 :(得分:10)

使用pandas时(如theta所示),无法独立缩放轴。

  

您无法找到不同垂直轴的原因是因为没有任何垂直轴。我们的平行坐标是"伪造"只需绘制一条垂直线和一些标签即可显示其他两个轴。

https://github.com/pydata/pandas/issues/7083#issuecomment-74253671

答案 3 :(得分:7)

回答相关问题时,我仅使用一个子图(这样可以很容易地与其他图配合)得出一个版本,并可以选择使用三次方贝塞尔曲线来连接这些点。该图可自行调整为所需的轴数。

import matplotlib.pyplot as plt
from matplotlib.path import Path
import matplotlib.patches as patches
import numpy as np

fig, host = plt.subplots()

# create some dummy data
ynames = ['P1', 'P2', 'P3', 'P4', 'P5']
N1, N2, N3 = 10, 5, 8
N = N1 + N2 + N3
category = np.concatenate([np.full(N1, 1), np.full(N2, 2), np.full(N3, 3)])
y1 = np.random.uniform(0, 10, N) + 7 * category
y2 = np.sin(np.random.uniform(0, np.pi, N)) ** category
y3 = np.random.binomial(300, 1 - category / 10, N)
y4 = np.random.binomial(200, (category / 6) ** 1/3, N)
y5 = np.random.uniform(0, 800, N)

# organize the data
ys = np.dstack([y1, y2, y3, y4, y5])[0]
ymins = ys.min(axis=0)
ymaxs = ys.max(axis=0)
dys = ymaxs - ymins
ymins -= dys * 0.05  # add 5% padding below and above
ymaxs += dys * 0.05
dys = ymaxs - ymins

# transform all data to be compatible with the main axis
zs = np.zeros_like(ys)
zs[:, 0] = ys[:, 0]
zs[:, 1:] = (ys[:, 1:] - ymins[1:]) / dys[1:] * dys[0] + ymins[0]


axes = [host] + [host.twinx() for i in range(ys.shape[1] - 1)]
for i, ax in enumerate(axes):
    ax.set_ylim(ymins[i], ymaxs[i])
    ax.spines['top'].set_visible(False)
    ax.spines['bottom'].set_visible(False)
    if ax != host:
        ax.spines['left'].set_visible(False)
        ax.yaxis.set_ticks_position('right')
        ax.spines["right"].set_position(("axes", i / (ys.shape[1] - 1)))

host.set_xlim(0, ys.shape[1] - 1)
host.set_xticks(range(ys.shape[1]))
host.set_xticklabels(ynames, fontsize=14)
host.tick_params(axis='x', which='major', pad=7)
host.spines['right'].set_visible(False)
host.xaxis.tick_top()
host.set_title('Parallel Coordinates Plot', fontsize=18)

colors = plt.cm.tab10.colors
for j in range(N):
    # to just draw straight lines between the axes:
    # host.plot(range(ys.shape[1]), zs[j,:], c=colors[(category[j] - 1) % len(colors) ])

    # create bezier curves
    # for each axis, there will a control vertex at the point itself, one at 1/3rd towards the previous and one
    #   at one third towards the next axis; the first and last axis have one less control vertex
    # x-coordinate of the control vertices: at each integer (for the axes) and two inbetween
    # y-coordinate: repeat every point three times, except the first and last only twice
    verts = list(zip([x for x in np.linspace(0, len(ys) - 1, len(ys) * 3 - 2, endpoint=True)],
                     np.repeat(zs[j, :], 3)[1:-1]))
    # for x,y in verts: host.plot(x, y, 'go') # to show the control points of the beziers
    codes = [Path.MOVETO] + [Path.CURVE4 for _ in range(len(verts) - 1)]
    path = Path(verts, codes)
    patch = patches.PathPatch(path, facecolor='none', lw=1, edgecolor=colors[category[j] - 1])
    host.add_patch(patch)
plt.tight_layout()
plt.show()

example plot

这是虹膜数据集的类似代码。第二根轴反转,以避免出现一些交叉线。

import matplotlib.pyplot as plt
from matplotlib.path import Path
import matplotlib.patches as patches
import numpy as np
from sklearn import datasets

iris = datasets.load_iris()
ynames = iris.feature_names
ys = iris.data
ymins = ys.min(axis=0)
ymaxs = ys.max(axis=0)
dys = ymaxs - ymins
ymins -= dys * 0.05  # add 5% padding below and above
ymaxs += dys * 0.05

ymaxs[1], ymins[1] = ymins[1], ymaxs[1]  # reverse axis 1 to have less crossings
dys = ymaxs - ymins

# transform all data to be compatible with the main axis
zs = np.zeros_like(ys)
zs[:, 0] = ys[:, 0]
zs[:, 1:] = (ys[:, 1:] - ymins[1:]) / dys[1:] * dys[0] + ymins[0]

fig, host = plt.subplots(figsize=(10,4))

axes = [host] + [host.twinx() for i in range(ys.shape[1] - 1)]
for i, ax in enumerate(axes):
    ax.set_ylim(ymins[i], ymaxs[i])
    ax.spines['top'].set_visible(False)
    ax.spines['bottom'].set_visible(False)
    if ax != host:
        ax.spines['left'].set_visible(False)
        ax.yaxis.set_ticks_position('right')
        ax.spines["right"].set_position(("axes", i / (ys.shape[1] - 1)))

host.set_xlim(0, ys.shape[1] - 1)
host.set_xticks(range(ys.shape[1]))
host.set_xticklabels(ynames, fontsize=14)
host.tick_params(axis='x', which='major', pad=7)
host.spines['right'].set_visible(False)
host.xaxis.tick_top()
host.set_title('Parallel Coordinates Plot — Iris', fontsize=18, pad=12)

colors = plt.cm.Set2.colors
legend_handles = [None for _ in iris.target_names]
for j in range(ys.shape[0]):
    # create bezier curves
    verts = list(zip([x for x in np.linspace(0, len(ys) - 1, len(ys) * 3 - 2, endpoint=True)],
                     np.repeat(zs[j, :], 3)[1:-1]))
    codes = [Path.MOVETO] + [Path.CURVE4 for _ in range(len(verts) - 1)]
    path = Path(verts, codes)
    patch = patches.PathPatch(path, facecolor='none', lw=2, alpha=0.7, edgecolor=colors[iris.target[j]])
    legend_handles[iris.target[j]] = patch
    host.add_patch(patch)
host.legend(legend_handles, iris.target_names,
            loc='lower center', bbox_to_anchor=(0.5, -0.18),
            ncol=len(iris.target_names), fancybox=True, shadow=True)
plt.tight_layout()
plt.show()

iris example

答案 4 :(得分:1)

plotly有一个不错的交互式解决方案,称为parallel_coordinates,它很好用:

import plotly.express as px
df = px.data.iris()
fig = px.parallel_coordinates(df, color="species_id", labels={"species_id": "Species",
                "sepal_width": "Sepal Width", "sepal_length": "Sepal Length",
                "petal_width": "Petal Width", "petal_length": "Petal Length", },
                             color_continuous_scale=px.colors.diverging.Tealrose,
                             color_continuous_midpoint=2)
fig.show()

parallel_coordinates

答案 5 :(得分:0)

到目前为止我见过的最好的例子就是这个

https://python.g-node.org/python-summerschool-2013/_media/wiki/datavis/olympics_vis.py

请参阅normalised_coordinates函数。不是超级快,但是从我尝试过的东西开始。

normalised_coordinates(['VAL_1', 'VAL_2', 'VAL_3'], np.array([[1230.23, 1500000, 12453.03], [930.23, 140000, 12453.03], [130.23, 120000, 1243.03]]), [1, 2, 1])

答案 6 :(得分:0)

距离完美还差得远,但是效果却很短:

import numpy as np

import matplotlib.pyplot as plt

def plot_parallel(data,labels):

    data=np.array(data)
    x=list(range(len(data[0])))
    fig, axis = plt.subplots(1, len(data[0])-1, sharey=False)


    for d in data:
        for i, a in enumerate(axis):
            temp=d[i:i+2].copy()
            temp[1]=(temp[1]-np.min(data[:,i+1]))*(np.max(data[:,i])-np.min(data[:,i]))/(np.max(data[:,i+1])-np.min(data[:,i+1]))+np.min(data[:,i])
            a.plot(x[i:i+2], temp)


    for i, a in enumerate(axis):
        a.set_xlim([x[i], x[i+1]])
        a.set_xticks([x[i], x[i+1]])
        a.set_xticklabels([labels[i], labels[i+1]], minor=False, rotation=45)
        a.set_ylim([np.min(data[:,i]),np.max(data[:,i])])


    plt.subplots_adjust(wspace=0)

    plt.show()