我有一张桌子material
+--------+-----+-------------------+----------------+-----------+
| ID | REV | name | Description | curr |
+--------+-----+-------------------+----------------+-----------+
| 211-32 | 001 | Screw 1.0 | Used in MAT 1 | READY |
| 211-32 | 002 | Screw 2 plus | can be Used-32 | WITHDRAWN |
| 212-41 | 001 | Bolt H1 | Light solid | READY |
| 212-41 | 002 | BOLT H2+Form | Heavy solid | READY |
| 101-24 | 001 | HexHead 1-A | NOR-1 | READY |
| 101-24 | 002 | HexHead Spl | NOR-22 | READY |
| 423-98 | 001 | Nut Repair spare | NORM1 | READY |
| 423-98 | 002 | Nut Repair Part-C | NORM2 | WITHDRAWN |
| 423-98 | 003 | Nut SP-C | NORM2+NORM1 | NULL |
| 654-01 | 001 | Bar | Specific only | WITHDRAWN |
| 654-01 | 002 | Bar rod-S | Designed+Spe | WITHDRAWN |
| 654-01 | 003 | Bar OPG | Hard spec | NULL |
+--------+-----+-------------------+----------------+-----------+
每个ID可以有多个修订版本。我想采用最新版本(即最高001,002,003等)。但是,如果最新修订版将curr
作为NULL
(字符串)或WITHDRAWN
,则我将采用先前的修订版及其对应的值。如果curr
甚至是NULL
或WITHDRAWN
,我都必须再次转到先前的版本。如果所有修订都存在相同的问题,那么我们可以忽略它。所以预期的输出是
+--------+-----+------------------+---------------+-------+
| ID | REV | name | Description | curr |
+--------+-----+------------------+---------------+-------+
| 211-32 | 001 | Screw 1.0 | Used in MAT 1 | READY |
| 212-41 | 002 | BOLT H2+Form | Heavy solid | READY |
| 101-24 | 002 | HexHead Spl | NOR-22 | READY |
| 423-98 | 001 | Nut Repair spare | NORM1 | READY |
+--------+-----+------------------+---------------+-------+
我是Python的新手。我已经尝试了下面的代码,但是我没有工作。任何建议都将受到高度赞赏。
import pandas as pd
import numpy as np
mydata = pd.read_csv('C:/Myfolder/Python/myfile.csv')
mydata.sort_values(['ID','REV'], ascending=[True, False]).drop_duplicates('',keep=last)
答案 0 :(得分:2)
您可以使用drop()
选择其中没有NULL或WITHDRAW的行,然后执行list_managers_rf <- lapply(list_managers, "-", drop(risk_free))
lapply(list_managers_rf, tail, 2)
## [[1]]
## HAM1 HAM2
## 2006-11-30 0 0.0089
## 2006-12-31 0 -0.0177
##
## [[2]]
## HAM3 HAM4
## 2006-11-30 0.0152 0.0256
## 2006-12-31 -0.0005 0.0091
和isin
:
sort_values
答案 1 :(得分:2)
我们可以创建一个伪列以获取最大值并返回其索引。
第一步是过滤掉我们要忽略的值。
df1 = df.loc[
df[~df["curr"].isin(["WITHDRAWN", "NULL"])]
.assign(key=df["REV"].astype(int))
.groupby("ID")["key"]
.idxmax()
]
ID REV name Description curr
6 101-24 002 HexHead Spl NOR-22 READY
1 211-32 001 Screw 1.0 Used in MAT 1 READY
4 212-41 002 BOLT H2+Form Heavy solid READY
7 423-98 001 Nut Repair spare NORM1 READY
答案 2 :(得分:1)
我认为您首先应该从表中删除NULL或WITHDRAW。
mydata[mydata[curr] == 'Ready'] # this should do I think...
然后您可以尝试进行排序并获取最大转速值。
mydata = mydata.sort_values(['ID','REV']).drop_duplicates('ID',keep='last')