Question

假设数据框df具有单个列（例如latency，即单变量样本）。计算超出函数并绘制如下：

sorted_df = df.sort_values('latency')
samples = len(sorted_df)
exceedance = [1-(x/samples) for x in range(1, samples + 1)]
ax.plot(df['latency'], exceedance, 'o')

是否有一种更简单/更优雅的方法来计算和绘制使用seaborn的单变量样本的超越函数（可能是distplot）？我最近学会了使用seaborn的{{1}}函数，但我只能按如下方式绘制cdf：

distplot

我对sns.distplot(df['latency'], hist=False, kde_kws={'cumulative':True})特别感兴趣，因为我计划将此功能与seaborn一起使用，以获得多个因素的超标图。

Answer 1

没有预定义的API /参数来计算超出量。所以，我不得不使用上面列出的代码。但考虑到我特别感兴趣的是获得几个因素的超越情节，并且我可以使用plt.plot和seaborn.FacetGrid，下面的代码就可以了。

def plot_exceedance(data, **kwargs):
    sorted_df = data.sort_values()
    samples = len(sorted_df)
    exceedance = [1-(x/samples) for x in range(1, samples + 1)]
    ax=plt.gca()
    ax.plot(sorted_df, exceedance, **kwargs)

g = sns.FacetGrid(df, row='factorA',col='factorB',hue='factorC')
g.map(plot_exceedance, 'latency')

其中factorA，factorB和factorC是df中的其他列。

Answer 2

因为您要求更优雅的方式，以下内容为您节省了两行代码并且速度更快。

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

def plot_exceedance(data, **kwargs):
    df = data.sort_values()
    exceedance = 1.-np.arange(1.,len(df) + 1.)/len(df)
    plt.plot(sorted_df, exceedance, **kwargs)

g = sns.FacetGrid(df, row='factorA',col='factorB',hue='factorC')
g.map(plot_exceedance, 'latency')

使用seaborn和pandas的超过（1-cdf）图

2 个答案: