我有两个数据框,分别是“男性”和“女性”
male = pd.DataFrame(np.array([[777, 'male', 9]
,[999, 'male', 9],[999, 'male', 9]])
,columns=['a', 'b', 'c'])
female=pd.DataFrame(np.array([[119, 'female', 9],[777, 'female', 9]
,[777, 'female', 9],[999, 'female', 9]])
,columns=['a', 'b', 'c'])
male:
a b c
0 777 male 9
1 999 male 9
2 999 male 9
female:
a b c
0 119 female 9
1 777 female 9
2 777 female 9
3 999 female 9
我需要从它们两者中删除公共部分,但要考虑到相同的行数,仅考虑列a and c
,例如,如果一个行的值(a和c)与第二个数据帧中的值相同,删除两者(即使有更多行匹配,也要删除2行)
我尝试使用
df=pd.concat([male,female]).drop_duplicates(subset=['a','c'])
print(df)
a b c
0 777 male 9
1 999 male 9
0 119 female 9
my expected output is:
a b c
2 999 male 9
0 119 female 9
2 777 female 9
结果是,我只需要删除第二个数据帧中存在的行,drop_duplicates()会删除不止一次存在的所有内容。 我只想在数据框之间删除重复项
答案 0 :(得分:2)
看看这是否适合您。
df=pd.concat([male,female])
df['g'] = df.groupby(['a','b','c'])['b'].cumcount()
df1=df.drop_duplicates(subset=['a','c','g']).drop_duplicates(subset=['a','c'],keep='last').drop('g', axis=1)
print(df1)
输出
a b c
2 999 male 9
0 119 female 9
2 777 female 9
答案 1 :(得分:0)
如果要分别删除两个数据框中的行,这是我的一些不同方法:
//Repository
@Query(value = "select m.codUseridPstme as codUseridPstme from OCMCharts o" +
" join o.ocmUnitsByChartsId u join u.ocmPostsByUnitsId p " +
"join p.ocmPostmemsByPostsId m where lower(trim(o.codCodeChrt)) =lower(trim(:chartCode))" +
" and p.flgIsmanagerPost=true " +
"and p.staPoststatePost='ACTIVE' and m.staMemberstatusPstme='ACTIVE' " +
"and u.codUnitcodeUnts in (select u.codUnitcodeUnts from OCMCharts o " +
"join o.ocmUnitsByChartsId u "+
"join u.ocmPostsByUnitsId p " +
"join p.ocmPostmemsByPostsId m " +
"where lower(trim(o.codCodeChrt)) =lower(trim(:chartCode)) and u.staStatusUnts='ACTIVE' and m.codUseridPstme=:userName)")
List<OCMUserNameDTO> getUnitManagerListByChartCodeAndUnitCodesIn(@Param("chartCode") String chartCode, @Param("userName") String userName);
//DTO
public interface OCMUserNameDTO {
String getCodUseridPstme();
}
//Console Query
select
ocmpostmem3_.COD_USERID_PSTME as col_0_0_
from
OCM.OCM_CHARTS ocmcharts0_
inner join
OCM.OCM_UNITS ocmunitsby1_
on ocmcharts0_.CHARTS_ID=ocmunitsby1_.CHRT_CHARTS_ID
inner join
OCM.OCM_POSTS ocmpostsby2_
on ocmunitsby1_.UNITS_ID=ocmpostsby2_.UNTS_UNITS_ID
inner join
OCM.OCM_POSTMEM ocmpostmem3_
on ocmpostsby2_.POSTS_ID=ocmpostmem3_.POST_POSTS_ID
where
lower(trim(ocmcharts0_.COD_CODE_CHRT))=lower(trim(?))
and ocmpostsby2_.FLG_ISMANAGER_POST=1
and ocmpostsby2_.STA_POSTSTATE_POST='ACTIVE'
and ocmpostmem3_.STA_MEMBERSTATUS_PSTME='ACTIVE'
and (
ocmunitsby1_.COD_UNITCODE_UNTS in (
select
ocmunitsby5_.COD_UNITCODE_UNTS
from
OCM.OCM_CHARTS ocmcharts4_
inner join
OCM.OCM_UNITS ocmunitsby5_
on ocmcharts4_.CHARTS_ID=ocmunitsby5_.CHRT_CHARTS_ID
inner join
OCM.OCM_POSTS ocmpostsby6_
on ocmunitsby5_.UNITS_ID=ocmpostsby6_.UNTS_UNITS_ID
inner join
OCM.OCM_POSTMEM ocmpostmem7_
on ocmpostsby6_.POSTS_ID=ocmpostmem7_.POST_POSTS_ID
where
lower(trim(ocmcharts4_.COD_CODE_CHRT))=lower(trim(?))
and ocmunitsby5_.STA_STATUS_UNTS='ACTIVE'
and ocmpostmem7_.COD_USERID_PSTME=?
)
)