python按列分组并按层次选择

时间:2019-09-05 09:19:18

标签: python pandas pandas-groupby

我有一个数据框,例如:

DecisionTreeClassifier()

和术语层次:

col-a   col-b
1       None
1       Failed
1       Passed
2       None
2       Passed
3       Inconclusive
3       Passed

我如何获得类似的东西:

Failed > Inconclusive > Passed > None

谢谢!

4 个答案:

答案 0 :(得分:2)

您可以为Series.map创建的列创建字典,然后使用DataFrame.sort_values按两列进行排序,并按DataFrame.drop_duplicates获得每个组的第一个唯一行:

d = {'Failed':0,'Inconclusive':1, 'Passed':2, None: 3}
df['new'] = df['col-b'].map(d)
df = df.sort_values(['col-a', 'new']).drop_duplicates('col-a').drop('new', 1)
print (df)
   col-a         col-b
1      1        Failed
4      2        Passed
5      3  Inconclusive

DataFrameGroupBy.idxmin的另一个想法:

d = {'Failed':0,'Inconclusive':1, 'Passed':2, None: 3}
df =  df.loc[df['col-b'].map(d).groupby(df['col-a']).idxmin()]
print (df)
   col-a         col-b
1      1        Failed
4      2        Passed
5      3  Inconclusive

答案 1 :(得分:2)

h = {'Failed':1, 'Inconclusive': 2, 'Passed':3, 'None':4}

(
    df.assign(b=df['col-b'].map(h))
    .groupby(by='col-a')
    .apply(lambda x: x.sort_values(by=['b']).head(1))
    .reset_index(drop=True)
    .drop('b',1)
)

col-a   col-b
0   1   Failed
1   2   Passed
2   3   Inconclusive

答案 2 :(得分:1)

使用

例如。

import pandas as pd

df = pd.DataFrame({'col-a': [1,1,1,2,2,3,3],
               'col-b': ['None','Failed','Passed','None','Passed','Inconclusive','Passed']})

df = df.drop(df[df['col-b'] == 'None'].index).groupby('col-a').first().reset_index()
# or
# m = df['col-b'].apply(lambda x: x == 'None')
# df = df[~m].groupby('col-a').first().reset_index()
print(df)

或遮罩和分组方式,如果None为类NoneType。

df = pd.DataFrame({'col-a': [1,1,1,2,2,3,3],
               'col-b': [None,'Failed','Passed',None,'Passed','Inconclusive','Passed']})
m = df['col-b'].apply(lambda x: x is None)
df = df[~m].groupby('col-a').first().reset_index()
print(df)

O / P:

   col-a         col-b
0      1        Failed
1      2        Passed
2      3  Inconclusive

答案 3 :(得分:0)

如果我正确理解问题,可以使用map

import pandas as pd

df = pd.DataFrame({'col-a': [1,1,1,2,2,3,3], 
                   'col-b': [None,'Failed','Passed',None,'Passed','Inconclusive','Passed']})

df['rang'] = df['col-b'].map({'Failed':1, 'Passed':2, 'Inconclusive':3})

df:

    col-a   col-b        rang
0   1       None         NaN
1   1       Failed       1.0
2   1       Passed       2.0
3   2       None         NaN
4   2       Passed       2.0
5   3       Inconclusive 3.0
6   3       Passed       2.0