Seaborn pairplot:使用hue

时间:2017-10-26 12:08:50

标签: python pandas matplotlib seaborn

当我定义hue为我的情节着色时,与没有map_lower的等效调用相比,hue更频繁地调用其函数并丢失数据。这是一个错误还是我犯了错误?

请参阅以下代码

import matplotlib.pyplot as plt
import pandas as pd
from scipy import stats
import seaborn as sns


def corrfunc(x, y, **kws):
    r, _ = stats.pearsonr(x, y)
    print(x)
    print(y)
    print(r)

iris = sns.load_dataset("iris")
seax = sns.pairplot(iris, size=2, vars=["petal_width", "petal_length", "sepal_width"])
seax.map_lower(corrfunc)
plt.show()

如果你改变了

sns.pairplot(iris, size=2, vars=["petal_width", "petal_length", "sepal_width"])

seax = sns.pairplot(iris, hue="sepal_length", size=2, vars=["petal_width", "petal_length", "sepal_width"])

代码被破坏但情节看起来不错。因此,如果您运行没有色调的代码,则corrfunc会在较低的3个图中调用3次。如果我添加hue =" class"通过字段类对图形着色,corrfunc被调低8次左右。我不明白为什么用色调着色会对map_lower产生影响。

2 个答案:

答案 0 :(得分:1)

所以也许有一天这会帮助那些想要做我想到的人。这是我丑陋但有效的解决方案:

#!/usr/bin/env python
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns

# Global variables to keep track of data chunks if you
# use hue to color the data points. map_lower will with
# hue group data in chunks of identical hue values

dataLength = xName = yName = xData = yData = ''


# Function to group data pairs to plot their correlation
def assemble_data_subplot(x, y, **kwargs):
    global xName, yName, xData, yData, dataLength
    if xName == '' and yName == '':
        xName = x.name
        yName = y.name
        xData = x
        yData = y
    elif xName == x.name and yName == y.name:
        xData = xData.append(x)
        yData = yData.append(y)

    if len(xData) == dataLength:
        correlate_data(xData, yData)
        xName = yName = xData = yData = ''


# Correlation function
def correlate_data(xData, yData):
    r, _ = stats.pearsonr(xData, yData)
    r = r**2
    sax = plt.gca()
    sax.annotate("$r^2$={:.2f}".format(r),
                 xy=(.02, .86),
                 xycoords=sax.transAxes)


# Main function to plot the pairwise correlation plot
def main():
    # Init global variable to set it later
    global dataLength

    # Path to CSV file and data frame builder
    df = sns.load_dataset("iris")

    # Example without hue
    g = sns.pairplot(df, size=2, hue="petal_width",
                     vars=["petal_width",
                           "petal_length",
                           "sepal_width"])

    # Get the number of data entries to check when the assembled data
    # is complete. Used in assemble_data_subplot
    dataLength = len(df)

    # Plot the r^2 value on the lower part of the pair plot
    g.map_lower(assemble_data_subplot)

    # Generate the output
    g.savefig("output.png")
    plt.show()


if __name__ == "__main__":
    main()

答案 1 :(得分:0)

在查看定义map_lower的代码时,我们会看到以下代码(我遗漏了相当多的一些内容以便更简洁)(遗漏的位与答案无关):

def map_lower(self, func, **kwargs):
        #irrelevant  parts left out
        for k, label_k in enumerate(self.hue_names):

            #some more irrelevant parts (specifying colours and what not)

            func(data_k[x_var], data_k[y_var], label=label_k,
                 color=color, **kwargs)

    return self

因此,基本上对于存在的每个唯一hue值,将运行func map.lower(对于每个变量)。

如果没有给出huefunc只会在所有相关数据上运行一次(对于每个变量)。因此,使用hue与不使用func的调用量之间存在差异。