我有一个数据框,例如:
DecisionTreeClassifier()
和术语层次:
col-a col-b
1 None
1 Failed
1 Passed
2 None
2 Passed
3 Inconclusive
3 Passed
我如何获得类似的东西:
Failed > Inconclusive > Passed > None
谢谢!
答案 0 :(得分:2)
您可以为Series.map
创建的列创建字典,然后使用DataFrame.sort_values
按两列进行排序,并按DataFrame.drop_duplicates
获得每个组的第一个唯一行:
d = {'Failed':0,'Inconclusive':1, 'Passed':2, None: 3}
df['new'] = df['col-b'].map(d)
df = df.sort_values(['col-a', 'new']).drop_duplicates('col-a').drop('new', 1)
print (df)
col-a col-b
1 1 Failed
4 2 Passed
5 3 Inconclusive
DataFrameGroupBy.idxmin
的另一个想法:
d = {'Failed':0,'Inconclusive':1, 'Passed':2, None: 3}
df = df.loc[df['col-b'].map(d).groupby(df['col-a']).idxmin()]
print (df)
col-a col-b
1 1 Failed
4 2 Passed
5 3 Inconclusive
答案 1 :(得分:2)
h = {'Failed':1, 'Inconclusive': 2, 'Passed':3, 'None':4}
(
df.assign(b=df['col-b'].map(h))
.groupby(by='col-a')
.apply(lambda x: x.sort_values(by=['b']).head(1))
.reset_index(drop=True)
.drop('b',1)
)
col-a col-b
0 1 Failed
1 2 Passed
2 3 Inconclusive
答案 2 :(得分:1)
使用
DataFrame.drop()
-从行或列中删除指定的标签。GroupBy.first()
-首先计算组值。DataFrame.reset_index()
-重置索引或索引的级别。例如。
import pandas as pd
df = pd.DataFrame({'col-a': [1,1,1,2,2,3,3],
'col-b': ['None','Failed','Passed','None','Passed','Inconclusive','Passed']})
df = df.drop(df[df['col-b'] == 'None'].index).groupby('col-a').first().reset_index()
# or
# m = df['col-b'].apply(lambda x: x == 'None')
# df = df[~m].groupby('col-a').first().reset_index()
print(df)
或遮罩和分组方式,如果None为类NoneType。
df = pd.DataFrame({'col-a': [1,1,1,2,2,3,3],
'col-b': [None,'Failed','Passed',None,'Passed','Inconclusive','Passed']})
m = df['col-b'].apply(lambda x: x is None)
df = df[~m].groupby('col-a').first().reset_index()
print(df)
O / P:
col-a col-b
0 1 Failed
1 2 Passed
2 3 Inconclusive
答案 3 :(得分:0)
如果我正确理解问题,可以使用map
。
import pandas as pd
df = pd.DataFrame({'col-a': [1,1,1,2,2,3,3],
'col-b': [None,'Failed','Passed',None,'Passed','Inconclusive','Passed']})
df['rang'] = df['col-b'].map({'Failed':1, 'Passed':2, 'Inconclusive':3})
df:
col-a col-b rang
0 1 None NaN
1 1 Failed 1.0
2 1 Passed 2.0
3 2 None NaN
4 2 Passed 2.0
5 3 Inconclusive 3.0
6 3 Passed 2.0