Question

我有以下数据：

fruit = pd.DataFrame({'fruit': ['apple', 'orange', 'apple', 'blueberry'], 
                      'colour': ['red', 'orange', 'green', 'black']})

costs = pd.DataFrame({'fruit': ['apple', 'orange', 'blueberry'],
                      'cost': [1.7, 1.4, 2.1]})

我想从fruit表中按cost排序的costs表的副本，但不包含成本列。最好的方法是什么？如果在中间步骤加入，那就没关系 - 我主要担心的是长期记忆浪费。

Answer 1

我会做左合并然后argsort：

range(0, len(fruit))

注意：如果你使用了不同的索引（对于水果），它将被忽略/替换为In [12]: fruit.merge(costs, how="left")["cost"].argsort() Out[12]: 0 1 1 0 2 2 3 3 Name: cost, dtype: int64。

In [13]: fruit.iloc[fruit.merge(costs, how="left")["cost"].argsort()]
Out[13]:
   colour      fruit
1  orange     orange
0     red      apple
2   green      apple
3   black  blueberry

现在使用iloc（按位置）而不是loc（按标签）重新排序。

In [21]: fruit.merge(costs).sort("cost").loc[:, fruit.columns]
Out[21]:
   colour      fruit
2  orange     orange
0     red      apple
1   green      apple
3   black  blueberry

注意：离开合并很重要，因为普通合并会改变顺序（!!）。它也更有效率。

另一种更清洁但效率更低的方法：

sort_values

注意：在下一个pandas中，sort可能优先于GET /users/1 ...

Answer 2

为什么不合并列，然后删除不需要的列

pd.merge(fruit , costs).sort_index(by = 'cost').drop('cost' , axis = 1 )

在一个单独的表上排序而不加入导致熊猫？

2 个答案: