Seaborn FacetGrid用于分层计数点?

时间:2017-09-22 10:40:16

标签: python pandas matplotlib jupyter-notebook seaborn

注意:可以在GitHub上找到full reproduction notebook for this question

我有一个数据集,其中包含HTTP响应代码的分布,我想按类分组。样本数据可以像这样生成:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

mock_http_response_data = pd.DataFrame({
    'response_code':np.repeat([200, 201, 202, 204, 302, 304, 400, 404, 500, 502], 250 ),
})

我根据“响应状态”在数据中添加了一列,称为“响应类”。响应类包含与特定响应的类对应的标签:

  • 2xx:成功
  • 3xx:警告
  • 4xx:客户端错误
  • 4xx:服务器错误

确定响应类的函数是:

def determine_response_class(row):    
    response_code = row['response_code']

    if response_code >= 200 and response_code < 300:
        return 'success'
    elif response_code >= 300 and response_code < 400:
        return 'warning'
    elif response_code >= 400 and response_code < 500:
        return 'client_error'
    elif response_code >= 500 and response_code < 600:
        return 'server_error'
    else:
        return 'unknown'

并添加如下列:

# Add 'Response class' column to API Logs, where response class is determined by HTTP status code
mock_http_response_data['response_class'] = mock_http_response_data.apply(determine_response_class, axis='columns')

“响应状态”(HTTP状态代码)数据使用基本计数图正确绘制:

sns.countplot(
    x='_source.response_status',
    data=results_df,
    color='teal',
    saturation=0.7)

uniform status code distribution

当我尝试创建一个计数图的FacetGrid时,图表似乎有效,但标签不正确:

grid = sns.FacetGrid(mock_http_response_data, col='response_class')

grid.map(sns.countplot, 'response_code')

enter image description here

我希望计数图的FacetGrid具有以下x轴标签:

  • 200
  • 201
  • 202
  • 302
  • 304
  • 400
  • 404
  • 500
  • 502

如何创建一个计数图的FacetGrid,以便标签正确并且分面数据从高到低排序(例如'成功'类列)?

1 个答案:

答案 0 :(得分:2)

出现错误标签的问题是因为默认情况下,子图的x轴是共享的,因此所有图都将与最后一个图具有相同的x轴。

您可以使用sharex=False参数来防止共享轴:

grid = sns.FacetGrid(df, col='class', sharex=False)

enter image description here

import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
import seaborn as sns

codes = [200, 201, 202, 204, 302, 304, 400, 404, 500, 502]
p = np.random.rand(len(codes))
p = p/p.sum()
df = pd.DataFrame({ 'code': np.random.choice(codes, size=300, p=p) })

def determine_response_class(row):    
response_code = row['code']

if response_code >= 200 and response_code < 300:
    return 'success'
elif response_code >= 300 and response_code < 400:
    return 'warning'
elif response_code >= 400 and response_code < 500:
    return 'client_error'
elif response_code >= 500 and response_code < 600:
    return 'server_error'
else:
    return 'unknown'

df['class'] = df.apply(determine_response_class, axis='columns')

grid = sns.FacetGrid(df, col='class', sharex=False)

grid.map(sns.countplot, 'code')

plt.show()

排序问题现在是一个鸡蛋或鸡蛋问题。为了设置列的顺序,您需要知道每个列的计数,这些计数是作为绘图的一部分确定的。在这一点上,坚持数据生成,分析和可视化之间的明确分离可能是明智的。以下将显示一个排序图,不使用FacetGrid,首先计算数据框中值的排序。

import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
import seaborn as sns

codes = [200, 201, 202, 204, 302, 304, 400, 404, 500, 502]
p = np.random.rand(len(codes))
p = p/p.sum()
df = pd.DataFrame({ 'code': np.random.choice(codes, size=300, p=p) })

def determine_response_class(row):    
    response_code = row['code']

    if response_code >= 200 and response_code < 300:
        return 'success'
    elif response_code >= 300 and response_code < 400:
        return 'warning'
    elif response_code >= 400 and response_code < 500:
        return 'client_error'
    elif response_code >= 500 and response_code < 600:
        return 'server_error'
    else:
        return 'unknown'

df['class'] = df.apply(determine_response_class, axis='columns')

df2 = df.groupby(["code","class"]).size().reset_index(name="count") \
        .sort_values(by="count", ascending=0).reset_index(drop=True)

fig, axes = plt.subplots(ncols=4, sharey=True, figsize=(8,3))
for ax,(n, group) in zip(axes, df2.groupby("class")):
    sns.barplot(x="code",y="count", data=group, ax=ax, color="C0", order=group["code"])
    ax.set_title(n)

plt.tight_layout()
plt.show()

enter image description here