这是我的数据,我想过滤最新版本
Id Score Version
1 67 One
1 89 Three
2 78 Two
2 70 One
这就是我想要的,因为Three
> Two
> One
Id Score Version
1 89 Three
2 78 Two
我做的是
versions = data.scorecard_version.str.extract('(One|Two|Three)', expand = False)
dummies = pd.get_dummies(versions)
df = pd.concat([df,dummies],axis = 1)
df['versions'] = df['One']*1 + df['Two']*2 + df['Three']*3
然后过滤最大值,但我正在寻求更好的解决方案
答案 0 :(得分:1)
您可以映射值,排序然后删除重复项:
df = pd.DataFrame([[1,67,'one'], [1, 89, 'three'],
[2, 78, 'two'], [2, 70, 'one']], columns = ['Id', 'Score', 'Version' ])
d = {'one':1,'two':2, 'three':3}
df['vers'] = df['Version'].map(d)
df = df.sort_values('vers', ascending=False).drop_duplicates('Id').sort_index()
输出:
Id Score Version vers
1 1 89 three 3
2 2 78 two 2