熊猫数据框-选择其中一列的值包含字符串而另一列的值以特定字符串开头的行

时间:2018-07-17 05:36:23

标签: python database pandas

我要选择state包含Traded一词且trading _book不以字母'E','L','N'开头的行

Test_Data = [('originating_system_id', ['RBCL', 'RBCL', 'RBCL','RBCL']),
             ('rbc_security_type1', ['CORP', 'CORP','CORP','CORP']),
             ('state', ['Traded', 'Traded Away','Traded','Traded Away']),
             ('trading_book', ['LCAAAAA','NUBBBBB','EDFGSFG','PDFEFGR'])
             ]
dfTest_Data = pd.DataFrame.from_items(Test_Data)
display(dfTest_Data)

originating_system_id   rbc_security_type1     state        trading_book
        RBCL                   CORP            Traded          LCAAAAA
        RBCL                   CORP            Traded Away     NUBBBBB
        RBCL                   CORP            Traded          EDFGSFG
        RBCL                   CORP            Traded Away     PDFEFGR

所需的输出:

originating_system_id   rbc_security_type1     state        trading_book
        RBCL                   CORP            Traded Away     PDFEFGR

我可以做到这一点:

prefixes = ['E','L','N']
df_Traded_Away_User = dfTest_Data[
                                    dfTest_Data[~dfTest_Data['trading_book'].str.startswith(tuple(prefixes))]  &
                                    (dfTest_Data['state'].str.contains('Traded')) 
                                ][['originating_system_id','rbc_security_type1','state','trading_book']]
display(df_Traded_Away_User)

但是我得到了:

ValueError: Must pass DataFrame with boolean values only

1 个答案:

答案 0 :(得分:3)

我建议分别创建每个布尔掩码,以获得更好的可读性,然后按&进行链接:

prefixes = ['E','L','N']

m1 = ~dfTest_Data['trading_book'].str.startswith(tuple(prefixes))
m2 = dfTest_Data['state'].str.contains('Traded')

cols = ['originating_system_id','rbc_security_type1','state','trading_book']
df_Traded_Away_User = dfTest_Data.loc[m1 & m2, cols]
print (df_Traded_Away_User)
  originating_system_id rbc_security_type1        state trading_book
3                  RBCL               CORP  Traded Away      PDFEFGR