注意:可以在GitHub上找到full reproduction notebook for this question。
我有一个数据集,其中包含HTTP响应代码的分布,我想按类分组。样本数据可以像这样生成:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
mock_http_response_data = pd.DataFrame({
'response_code':np.repeat([200, 201, 202, 204, 302, 304, 400, 404, 500, 502], 250 ),
})
我根据“响应状态”在数据中添加了一列,称为“响应类”。响应类包含与特定响应的类对应的标签:
确定响应类的函数是:
def determine_response_class(row):
response_code = row['response_code']
if response_code >= 200 and response_code < 300:
return 'success'
elif response_code >= 300 and response_code < 400:
return 'warning'
elif response_code >= 400 and response_code < 500:
return 'client_error'
elif response_code >= 500 and response_code < 600:
return 'server_error'
else:
return 'unknown'
并添加如下列:
# Add 'Response class' column to API Logs, where response class is determined by HTTP status code
mock_http_response_data['response_class'] = mock_http_response_data.apply(determine_response_class, axis='columns')
“响应状态”(HTTP状态代码)数据使用基本计数图正确绘制:
sns.countplot(
x='_source.response_status',
data=results_df,
color='teal',
saturation=0.7)
当我尝试创建一个计数图的FacetGrid时,图表似乎有效,但标签不正确:
grid = sns.FacetGrid(mock_http_response_data, col='response_class')
grid.map(sns.countplot, 'response_code')
我希望计数图的FacetGrid具有以下x轴标签:
如何创建一个计数图的FacetGrid,以便标签正确并且分面数据从高到低排序(例如'成功'类列)?
答案 0 :(得分:2)
出现错误标签的问题是因为默认情况下,子图的x轴是共享的,因此所有图都将与最后一个图具有相同的x轴。
您可以使用sharex=False
参数来防止共享轴:
grid = sns.FacetGrid(df, col='class', sharex=False)
import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
import seaborn as sns
codes = [200, 201, 202, 204, 302, 304, 400, 404, 500, 502]
p = np.random.rand(len(codes))
p = p/p.sum()
df = pd.DataFrame({ 'code': np.random.choice(codes, size=300, p=p) })
def determine_response_class(row):
response_code = row['code']
if response_code >= 200 and response_code < 300:
return 'success'
elif response_code >= 300 and response_code < 400:
return 'warning'
elif response_code >= 400 and response_code < 500:
return 'client_error'
elif response_code >= 500 and response_code < 600:
return 'server_error'
else:
return 'unknown'
df['class'] = df.apply(determine_response_class, axis='columns')
grid = sns.FacetGrid(df, col='class', sharex=False)
grid.map(sns.countplot, 'code')
plt.show()
排序问题现在是一个鸡蛋或鸡蛋问题。为了设置列的顺序,您需要知道每个列的计数,这些计数是作为绘图的一部分确定的。在这一点上,坚持数据生成,分析和可视化之间的明确分离可能是明智的。以下将显示一个排序图,不使用FacetGrid
,首先计算数据框中值的排序。
import pandas as pd
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
import seaborn as sns
codes = [200, 201, 202, 204, 302, 304, 400, 404, 500, 502]
p = np.random.rand(len(codes))
p = p/p.sum()
df = pd.DataFrame({ 'code': np.random.choice(codes, size=300, p=p) })
def determine_response_class(row):
response_code = row['code']
if response_code >= 200 and response_code < 300:
return 'success'
elif response_code >= 300 and response_code < 400:
return 'warning'
elif response_code >= 400 and response_code < 500:
return 'client_error'
elif response_code >= 500 and response_code < 600:
return 'server_error'
else:
return 'unknown'
df['class'] = df.apply(determine_response_class, axis='columns')
df2 = df.groupby(["code","class"]).size().reset_index(name="count") \
.sort_values(by="count", ascending=0).reset_index(drop=True)
fig, axes = plt.subplots(ncols=4, sharey=True, figsize=(8,3))
for ax,(n, group) in zip(axes, df2.groupby("class")):
sns.barplot(x="code",y="count", data=group, ax=ax, color="C0", order=group["code"])
ax.set_title(n)
plt.tight_layout()
plt.show()