Question

我在向熊猫写查询时遇到问题。我有一个数据框：gsub。

任务是获取最昂贵产品的销售数量。当我这样写查询时：

(item_name, order_id, quantity, item_price1)

它工作正常。但是，当我尝试使用 sort_values 这样的查询结果对查询结果进行排序以找到最昂贵的查询时：

df.groupby('item_name')['item_price1','quantity'].agg(['max','count'])

它以错误结尾：

键错误“最大”

正确的做法是什么？

Answer 1

移动我的评论以回答：

当您执行以下操作时：

agg = df.groupby('item_name')['item_price1','quantity'].agg(['max','count'])

您最终建立了一个多级列索引，在这种情况下（agg.columns）是：

MultiIndex([('item_price1',   'max'),
            ('item_price1', 'count'),
            (   'quantity',   'max'),
            (   'quantity', 'count')],
           )

然后将其用于排序，您需要使用特定级别，例如：

agg.sort_values(by=('item_price1', 'max'), ascending=False)

附带说明-您正在.head(10)处用于对数据进行完全排序后限制总输出，但是如果您的数据使用.nlargest可能会更好与总体相比，样本量较小，例如：

agg.nlargest(10, ('item_price1', 'max'))

如何编写正确排序的查询

1 个答案: