当我定义hue
为我的情节着色时,与没有map_lower
的等效调用相比,hue
更频繁地调用其函数并丢失数据。这是一个错误还是我犯了错误?
请参阅以下代码
import matplotlib.pyplot as plt
import pandas as pd
from scipy import stats
import seaborn as sns
def corrfunc(x, y, **kws):
r, _ = stats.pearsonr(x, y)
print(x)
print(y)
print(r)
iris = sns.load_dataset("iris")
seax = sns.pairplot(iris, size=2, vars=["petal_width", "petal_length", "sepal_width"])
seax.map_lower(corrfunc)
plt.show()
如果你改变了
sns.pairplot(iris, size=2, vars=["petal_width", "petal_length", "sepal_width"])
到
seax = sns.pairplot(iris, hue="sepal_length", size=2, vars=["petal_width", "petal_length", "sepal_width"])
代码被破坏但情节看起来不错。因此,如果您运行没有色调的代码,则corrfunc会在较低的3个图中调用3次。如果我添加hue =" class"通过字段类对图形着色,corrfunc被调低8次左右。我不明白为什么用色调着色会对map_lower产生影响。
答案 0 :(得分:1)
所以也许有一天这会帮助那些想要做我想到的人。这是我丑陋但有效的解决方案:
#!/usr/bin/env python
import matplotlib.pyplot as plt
from scipy import stats
import seaborn as sns
# Global variables to keep track of data chunks if you
# use hue to color the data points. map_lower will with
# hue group data in chunks of identical hue values
dataLength = xName = yName = xData = yData = ''
# Function to group data pairs to plot their correlation
def assemble_data_subplot(x, y, **kwargs):
global xName, yName, xData, yData, dataLength
if xName == '' and yName == '':
xName = x.name
yName = y.name
xData = x
yData = y
elif xName == x.name and yName == y.name:
xData = xData.append(x)
yData = yData.append(y)
if len(xData) == dataLength:
correlate_data(xData, yData)
xName = yName = xData = yData = ''
# Correlation function
def correlate_data(xData, yData):
r, _ = stats.pearsonr(xData, yData)
r = r**2
sax = plt.gca()
sax.annotate("$r^2$={:.2f}".format(r),
xy=(.02, .86),
xycoords=sax.transAxes)
# Main function to plot the pairwise correlation plot
def main():
# Init global variable to set it later
global dataLength
# Path to CSV file and data frame builder
df = sns.load_dataset("iris")
# Example without hue
g = sns.pairplot(df, size=2, hue="petal_width",
vars=["petal_width",
"petal_length",
"sepal_width"])
# Get the number of data entries to check when the assembled data
# is complete. Used in assemble_data_subplot
dataLength = len(df)
# Plot the r^2 value on the lower part of the pair plot
g.map_lower(assemble_data_subplot)
# Generate the output
g.savefig("output.png")
plt.show()
if __name__ == "__main__":
main()
答案 1 :(得分:0)
在查看定义map_lower
的代码时,我们会看到以下代码(我遗漏了相当多的一些内容以便更简洁)(遗漏的位与答案无关):
def map_lower(self, func, **kwargs):
#irrelevant parts left out
for k, label_k in enumerate(self.hue_names):
#some more irrelevant parts (specifying colours and what not)
func(data_k[x_var], data_k[y_var], label=label_k,
color=color, **kwargs)
return self
因此,基本上对于存在的每个唯一hue
值,将运行func
map.lower
(对于每个变量)。
如果没有给出hue
,func
只会在所有相关数据上运行一次(对于每个变量)。因此,使用hue
与不使用func
的调用量之间存在差异。