Question

我有可以很好地绘制sns stripplot的代码：

  Select Id from MyTable 
    where ProductionYear BETWEEN (case when @startDate>@endDate then @endDate
 else @startDate  end) and 
(case when @startDate>@endDate then @startDate else @endDate end)

但是汽车模型太多，所以我只想可视化数据集中前n个最频繁出现的汽车模型。还可以在不创建单独数据帧的情况下将任何lambda计算或类似的方法应用于f, ax = plt.subplots(figsize=(15,12)) sns.stripplot(data = cars, x='price', y='model', jitter=.5) plt.show()或price吗？

如果有更好的可视化库可以帮助您提出建议。

Answer 1

您可以使用value_counts()查找一列中出现次数最多的值。在这里，我选择了最常出现的2个模型：

most_occurring_values = cars['model'].value_counts().head(2).index

然后，您可以过滤原始数据框，仅选择包含频率最高的模型的行：

cars_subset = cars[cars['model'].isin(most_occurring_values)]

最后，使用该子集绘制数据：

f, ax = plt.subplots(figsize=(15,12))
sns.stripplot(data = cars_subset, x='price', y='model', jitter=.5)
plt.show()

Answer 2

根据official documentation

顺序，色相顺序：字符串列表，可选。为了绘制类别级别，否则从数据推断级别对象。

要选择排名前三的模型，您可以执行以下操作：

sns.stripplot(data = cars, x='price', y='model', jitter=.5, order=cars.model.value_counts().iloc[:3].index)

仅带前n个类别的sns stripplot

2 个答案: