根据Pandas中的2列寻找快速优雅的bin方式。
这是我的数据框
filename height width
0 shopfronts_23092017_3_285.jpg 750.0 560.0
1 shopfronts_200.jpg 4395.0 6020.0
2 shopfronts_25092017_eateries_98.jpg 414.0 621.0
3 shopfronts_101.jpg 480.0 640.0
4 shopfronts_138.jpg 3733.0 8498.0
5 shopfronts_25092017_eateries_95.jpg 187.0 250.0
6 shopfronts_25092017_neon_33.jpg 100.0 200.0
7 shopfronts_322.jpg 682.0 1024.0
8 shopfronts_171.jpg 800.0 600.0
9 shopfronts_23092017_3_35.jpg 120.0 210.0
我需要根据2列高度和1列来记录记录。宽度(图像分辨率)
我正在寻找类似的东西
filename height width group
0 shopfronts_23092017_3_285.jpg 750.0 560.0 g3
1 shopfronts_200.jpg 4395.0 6020.0 g4
2 shopfronts_25092017_eateries_98.jpg 414.0 621.0 others
3 shopfronts_101.jpg 480.0 640.0 others
4 shopfronts_138.jpg 3733.0 8498.0 g4
5 shopfronts_25092017_eateries_95.jpg 187.0 250.0 g1
6 shopfronts_25092017_neon_33.jpg 100.0 200.0 g1
7 shopfronts_322.jpg 682.0 1024.0 others
8 shopfronts_171.jpg 800.0 600.0 g3
9 shopfronts_23092017_3_35.jpg 120.0 210.0 g1
where
g1: <= 400x300]
g2: (400x300, 640x480]
g3: (640x480, 800x600]
g4: > 800x600
others: If they don't comply to the requirement (Ex: records 7,2,3 - either height or width will fall in the categories defined but not both)
希望使用组列获取频率计数。如果这不是最佳方式,如果有更好的方法,请告诉我。
答案 0 :(得分:3)
您可以使用双pd.cut
即
bins = [0,400,640,800,np.inf]
df['group'] = pd.cut(df['height'].values, bins,labels=["g1","g2","g3",'g4'])
nbin = [0,300,480,600,np.inf]
t = pd.cut(df['width'].values, nbin,labels=["g1","g2","g3",'g4'])
df['group'] =np.where(df['group'] == t,df['group'],'others')
filename height width group 0 shopfronts_23092017_3_285.jpg 750.0 560.0 g3 1 shopfronts_200.jpg 4395.0 6020.0 g4 2 shopfronts_25092017_eateries_98.jpg 414.0 621.0 others 3 shopfronts_101.jpg 480.0 640.0 others 4 shopfronts_138.jpg 3733.0 8498.0 g4 5 shopfronts_25092017_eateries_95.jpg 187.0 250.0 g1 6 shopfronts_25092017_neon_33.jpg 100.0 200.0 g1 7 shopfronts_322.jpg 682.0 1024.0 others 8 shopfronts_171.jpg 800.0 600.0 g3 9 shopfronts_23092017_3_35.jpg 120.0 210.0 g1
答案 1 :(得分:2)
使用np.where
In [4510]: df['group'] = np.where((df.height <= 400) & (df.width <= 300),
...: 'g1',
...: np.where((df.height <= 640) & (df.width <= 480),
...: 'g2',
...: np.where((df.height <= 800) & (df.width <= 600),
...: 'g3',
...: np.where((df.height > 800) & (df.width > 600),
...: 'g4',
...: 'others'))))
In [4511]: df
Out[4511]:
filename height width group
0 shopfronts_23092017_3_285.jpg 750.0 560.0 g3
1 shopfronts_200.jpg 4395.0 6020.0 g4
2 shopfronts_25092017_eateries_98.jpg 414.0 621.0 others
3 shopfronts_101.jpg 480.0 640.0 others
4 shopfronts_138.jpg 3733.0 8498.0 g4
5 shopfronts_25092017_eateries_95.jpg 187.0 250.0 g1
6 shopfronts_25092017_neon_33.jpg 100.0 200.0 g1
7 shopfronts_322.jpg 682.0 1024.0 others
8 shopfronts_171.jpg 800.0 600.0 g3
9 shopfronts_23092017_3_35.jpg 120.0 210.0 g1