删除条件中数据框的公共部分。大熊猫

时间:2019-10-14 12:14:40

标签: python pandas

我有两个数据框,分别是“男性”和“女性”

male = pd.DataFrame(np.array([[777, 'male', 9]
                                ,[999, 'male', 9],[999, 'male', 9]])
                   ,columns=['a', 'b', 'c'])

female=pd.DataFrame(np.array([[119, 'female', 9],[777, 'female', 9]
                                ,[777, 'female', 9],[999, 'female', 9]])
                   ,columns=['a', 'b', 'c'])


male:
     a     b  c
0  777  male  9
1  999  male  9
2  999  male  9

female:
     a       b  c
0  119  female  9
1  777  female  9
2  777  female  9
3  999  female  9

我需要从它们两者中删除公共部分,但要考虑到相同的行数,仅考虑列a and c,例如,如果一个行的值(a和c)与第二个数据帧中的值相同,删除两者(即使有更多行匹配,也要删除2行)

我尝试使用

df=pd.concat([male,female]).drop_duplicates(subset=['a','c'])
print(df)
     a       b  c
0  777    male  9
1  999    male  9
0  119  female  9

my expected output is:
     a       b  c
2  999    male  9
0  119  female  9
2  777  female  9

结果是,我只需要删除第二个数据帧中存在的行,drop_duplicates()会删除不止一次存在的所有内容。 我只想在数据框之间删除重复项

2 个答案:

答案 0 :(得分:2)

看看这是否适合您。

df=pd.concat([male,female])
df['g'] = df.groupby(['a','b','c'])['b'].cumcount()
df1=df.drop_duplicates(subset=['a','c','g']).drop_duplicates(subset=['a','c'],keep='last').drop('g', axis=1)
print(df1)

输出

      a        b    c
2   999     male    9
0   119     female  9
2   777     female  9

答案 1 :(得分:0)

如果要分别删除两个数据框中的行,这是我的一些不同方法:

 //Repository
        @Query(value = "select m.codUseridPstme as codUseridPstme from OCMCharts o" +
                    " join o.ocmUnitsByChartsId u join u.ocmPostsByUnitsId p " +
                    "join p.ocmPostmemsByPostsId m where lower(trim(o.codCodeChrt)) =lower(trim(:chartCode))" +
                    " and p.flgIsmanagerPost=true " +
                    "and p.staPoststatePost='ACTIVE' and m.staMemberstatusPstme='ACTIVE' " +
                    "and u.codUnitcodeUnts in (select u.codUnitcodeUnts  from OCMCharts o " +
                    "join o.ocmUnitsByChartsId u "+
                    "join u.ocmPostsByUnitsId p " +
                    "join p.ocmPostmemsByPostsId m " +
                    "where lower(trim(o.codCodeChrt)) =lower(trim(:chartCode)) and u.staStatusUnts='ACTIVE' and m.codUseridPstme=:userName)")
            List<OCMUserNameDTO> getUnitManagerListByChartCodeAndUnitCodesIn(@Param("chartCode") String chartCode, @Param("userName") String userName);


        //DTO
        public interface OCMUserNameDTO {

            String getCodUseridPstme();

        }

//Console Query 

 select
        ocmpostmem3_.COD_USERID_PSTME as col_0_0_ 
    from
        OCM.OCM_CHARTS ocmcharts0_ 
    inner join
        OCM.OCM_UNITS ocmunitsby1_ 
            on ocmcharts0_.CHARTS_ID=ocmunitsby1_.CHRT_CHARTS_ID 
    inner join
        OCM.OCM_POSTS ocmpostsby2_ 
            on ocmunitsby1_.UNITS_ID=ocmpostsby2_.UNTS_UNITS_ID 
    inner join
        OCM.OCM_POSTMEM ocmpostmem3_ 
            on ocmpostsby2_.POSTS_ID=ocmpostmem3_.POST_POSTS_ID 
    where
        lower(trim(ocmcharts0_.COD_CODE_CHRT))=lower(trim(?)) 
        and ocmpostsby2_.FLG_ISMANAGER_POST=1 
        and ocmpostsby2_.STA_POSTSTATE_POST='ACTIVE' 
        and ocmpostmem3_.STA_MEMBERSTATUS_PSTME='ACTIVE' 
        and (
            ocmunitsby1_.COD_UNITCODE_UNTS in (
                select
                    ocmunitsby5_.COD_UNITCODE_UNTS 
                from
                    OCM.OCM_CHARTS ocmcharts4_ 
                inner join
                    OCM.OCM_UNITS ocmunitsby5_ 
                        on ocmcharts4_.CHARTS_ID=ocmunitsby5_.CHRT_CHARTS_ID 
                inner join
                    OCM.OCM_POSTS ocmpostsby6_ 
                        on ocmunitsby5_.UNITS_ID=ocmpostsby6_.UNTS_UNITS_ID 
                inner join
                    OCM.OCM_POSTMEM ocmpostmem7_ 
                        on ocmpostsby6_.POSTS_ID=ocmpostmem7_.POST_POSTS_ID 
                where
                    lower(trim(ocmcharts4_.COD_CODE_CHRT))=lower(trim(?)) 
                    and ocmunitsby5_.STA_STATUS_UNTS='ACTIVE' 
                    and ocmpostmem7_.COD_USERID_PSTME=?
            )
        )