我的csv如下
+-----+---------+-----------+------------+
| ID | version | Name | State |
+-----+---------+-----------+------------+
| 101 | 1 | Nut | In-Transit |
| 101 | 1 | Nut | Cancelled |
| 101 | 1 | Nut | Delivered |
| 101 | 2 | Nut 2.0 | In-Transit |
| 102 | 1 | Screw | Shipped |
| 102 | 1 | Screw | In-Transit |
| 102 | 2 | Screw 2.0 | Shipped |
| 102 | 2 | Screw 2.0 | Cancelled |
+-----+---------+-----------+------------+
现在,我想在每个ID和版本组合的所有可用状态中处于最高状态(基于低于优先级)。
我的自定义订单
预期产量
+-----+---------+-----------+------------+
| ID | version | Name | State |
+-----+---------+-----------+------------+
| 101 | 1 | Nut | Delivered |
| 101 | 2 | Nut 2.0 | In-Transit |
| 102 | 1 | Screw | In-Transit |
| 102 | 2 | Screw 2.0 | Shipped |
+-----+---------+-----------+------------+
我已经尝试过下面的查询,但是没有用。我是python的新手,我不确定如何解决此问题。
import pandas as pd
mydata = pd.read_csv('C:/Mypython/Newyork',encoding = "ISO-8859-1")
mydata['state'] = pd.Categorical(mydata['state'], ["Delivered","In-Transit","Shipped","Cancelled"])
mydate.sort_values('state').drop_duplicates(['ID','VERSION'],keep='first')
答案 0 :(得分:1)
对于我来说,工作正常,似乎没有分配回新变量的步骤:
mydata['State'] = pd.Categorical(mydata['State'],
["Delivered", "In-Transit", "Shipped", "Cancelled"],
ordered=True)
#keep='first'is default value, so should be omitted
mydata = mydata.sort_values('state').drop_duplicates(['ID','version'])
print (mydata)
ID version Name state
2 101 1 Nut Delivered
3 101 2 Nut 2.0 In-Transit
5 102 1 Screw In-Transit
6 102 2 Screw 2.0 Shipped
如果要按ID
对输出进行排序,version
可以按多列添加排序:
mydata['State'] = pd.Categorical(mydata['State'],
["Delivered", "In-Transit", "Shipped", "Cancelled"],
ordered=True)
mydata = mydata.sort_values(['ID','version','state']).drop_duplicates(['ID','version'])
答案 1 :(得分:1)
使用pd.Categorical
和ordered=True
创建一个分类变量,然后在此分类变量上使用sort_values
,并在groupby
和ID, version
上使用agg
first
:
mydata['State'] = pd.Categorical(mydata['State'], ["Delivered", "In-Transit", "Shipped", "Cancelled"], ordered=True)
df = mydata.sort_values('State').groupby(['ID', 'version'], as_index=False).first()
结果:
ID version Name State
0 101 1 Nut Delivered
1 101 2 Nut 2.0 In-Transit
2 102 1 Screw In-Transit
3 102 2 Screw 2.0 Shipped