熊猫查询的奇怪行为

时间:2019-03-16 09:33:24

标签: pandas python-3.6

我已经从Kaggle(https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_geolocation_dataset.csv)下载了olist_geolocation_dataset,并且正在做首次分析。

我的代码如下:

geolocation = pd.read_csv('olist_geolocation_dataset.csv')
df = geolocation.groupby(['geolocation_lat', 'geolocation_lng'], as_index = False)['geolocation_state'].count()
df.sort_values('geolocation_state', ascending = False).head()

enter image description here

geolocation.query('geolocation_lat == -23.495901')

enter image description here

enter image description here

我的问题是:鉴于存在过滤条件中传递的值,为什么查询返回一个空的数据帧?

1 个答案:

答案 0 :(得分:0)

问题是值是浮点数,因此由于精度问题,需要numpy.iscloseboolean indexing

out = geolocation[np.isclose(geolocation['geolocation_lat'], -23.495901)]
print (out.head())
       geolocation_zip_code_prefix  geolocation_lat  geolocation_lng  \
19112                         2020       -23.495993       -46.635616   
19118                         2020       -23.495960       -46.634081   
19129                         2020       -23.495861       -46.636183   
19161                         2044       -23.495681       -46.618947   
19167                         2084       -23.495675       -46.599478   

      geolocation_city geolocation_state  
19112        sao paulo                SP  
19118        sao paulo                SP  
19129        sao paulo                SP  
19161        sao paulo                SP  
19167        sao paulo                SP