熊猫按多列分组,并根据多种条件删除行

时间:2019-03-23 06:21:05

标签: python pandas

我有一个如下数据框:

imagename,locationName,brandname,x,y,w,h,xdiff,ydiff
95-20180407-215120-235505-00050.jpg,Shirt,SAMSUNG,0,490,177,82,0,0
95-20180407-215120-235505-00050.jpg,Shirt,SAMSUNG,1,491,182,78,1,1
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,3,450,94,45,2,-41
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,5,451,95,48,2,1
95-20180407-215120-235505-00050.jpg,DUGOUT,VIVO,167,319,36,38,162,-132
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,446,349,99,90,279,30
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,455,342,84,93,9,-7
95-20180407-215120-235505-00050.jpg,Shirt,GOIBIBO,559,212,70,106,104,-130

它是一个csv转储。由此,我想按图像名称和品牌名称进行分组。如果xdiff和ydiff中的值小于10,则删除第二行。

例如,我要从前两行中删除第二行,类似地,从第3行和第4行中,我要删除第4行。

我可以使用dplyr group by,滞后和超前函数在R中快速执行此操作。但是,我不确定如何在python中组合不同的功能来实现这一点。到目前为止,这是我尝试过的:

df[df.groupby(['imagename','brandname']).xdiff.transform() <= 10]

不确定在转换中应该调用哪个函数以及如何包含ydiff

预期输出如下:

imagename,locationName,brandname,x,y,w,h,xdiff,ydiff
95-20180407-215120-235505-00050.jpg,Shirt,SAMSUNG,0,490,177,82,0,0
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,3,450,94,45,2,-41
95-20180407-215120-235505-00050.jpg,DUGOUT,VIVO,167,319,36,38,162,-132
95-20180407-215120-235505-00050.jpg,Shirt,DHFL,446,349,99,90,279,30
95-20180407-215120-235505-00050.jpg,Shirt,GOIBIBO,559,212,70,106,104,-130

1 个答案:

答案 0 :(得分:1)

您可以拍摄单独的分组框并通过apply函数应用条件

#df.groupby(['imagename','brandname'],group_keys=False).apply(lambda x: x.iloc[range(0,len(x),2)] if x['xdiff'].lt(10).any() else x)
df.groupby(['imagename','brandname'],group_keys=False).apply(lambda x: x.iloc[range(0,len(x),2)] if (x['xdiff'].lt(10).any() and x['ydiff'].lt(10).any()) else x)

出局:

    imagename   locationName    brandname   x   y   w   h   xdiff   ydiff
2   95-20180407-215120-235505-00050.jpg Shirt   DHFL    3   450 94  45  2   -41
5   95-20180407-215120-235505-00050.jpg Shirt   DHFL    446 349 99  90  279 30
7   95-20180407-215120-235505-00050.jpg Shirt   GOIBIBO 559 212 70  106 104 -130
0   95-20180407-215120-235505-00050.jpg Shirt   SAMSUNG 0   490 177 82  0   0
4   95-20180407-215120-235505-00050.jpg DUGOUT  VIVO    167 319 36  38  162 -132