根据python pandas中的2列选择DF中的特定行

时间:2017-05-28 09:33:14

标签: python pandas

我将excel中的数据加载到pandas数据帧中。我现在只想选择那些ASSESSMENT ID是每个APPID的最大ASSESSMENT ID以及该APPID的所有UI SEQ NUMBERS的行。

APPID   APPNAME ASSESSMENT ID   UI SEQ NUMBER   QUESTION    ANSWER TEXT .   
1   appname 2493    11  Question    No .   
1   appname 13808   11  Question    Ctry of domicile .   
1   appname 13808   11  Question    Name .   
1   appname 35316   11  Question    Ctry of domicile .       
1   appname 35316   11  Question    Name .   
1   appname 35316   11  Question    Nationality .       
1   appname 2493    12  Question    Corp name .   
1   appname 2493    12  Question    Cr Br Scr .   
1   appname 2493    12  Question    Inc And Assests .   
1   appname 2493    12  Question    Int, Ext Reg Reports .   
1   appname 13808   12  Question    Corp name .   
1   appname 35316   12  Question    Corp name .   
1   appname 2493    13  Question    No .   
1   appname 13808   13  Question    No .   
1   appname 35316   13  Question    No .   
1   appname 2493    14  Question    No .   
1   appname 13808   14  Question    firms Pos .   
1   appname 35316   14  Question    firms Pos .   

结果将是

APPID   APPNAME ASSESSMENT ID   UI SEQ NUMBER   QUESTION    ANSWER TEXT .   
1   appname 35316   11  Question    Ctry of domicile .   
1   appname 35316   11  Question    Name .   
1   appname 35316   11  Question    Nationality .   
1   appname 35316   12  Question    Corp name .   
1   appname 35316   13  Question    No .   
1   appname 35316   14  Question    firms Pos .   

1 个答案:

答案 0 :(得分:1)

我认为您需要使用apply创建的掩码boolean indexing

df1 = df[df.groupby(['APPID', 'UI SEQ NUMBER'])['ASSESSMENT ID'].apply(lambda x:x==x.max())]
print (df1)
    APPID  APPNAME  ASSESSMENT ID  UI SEQ NUMBER  QUESTION       ANSWER TEXT.
3       1  appname          35316             11  Question  Ctry of domicile.
4       1  appname          35316             11  Question              Name.
5       1  appname          35316             11  Question       Nationality.
11      1  appname          35316             12  Question         Corp name.
14      1  appname          35316             13  Question                No.
17      1  appname          35316             14  Question         firms Pos.

或者,如果不需要所有重复的值,请使用idxmax

df1 = df.loc[df.groupby(['APPID', 'UI SEQ NUMBER'])['ASSESSMENT ID'].idxmax()]
print (df1)
    APPID  APPNAME  ASSESSMENT ID  UI SEQ NUMBER  QUESTION       ANSWER TEXT.
3       1  appname          35316             11  Question  Ctry of domicile.
11      1  appname          35316             12  Question         Corp name.
14      1  appname          35316             13  Question                No.
17      1  appname          35316             14  Question         firms Pos.