我已经从Kaggle(https://www.kaggle.com/olistbr/brazilian-ecommerce#olist_geolocation_dataset.csv)下载了olist_geolocation_dataset,并且正在做首次分析。
我的代码如下:
geolocation = pd.read_csv('olist_geolocation_dataset.csv')
df = geolocation.groupby(['geolocation_lat', 'geolocation_lng'], as_index = False)['geolocation_state'].count()
df.sort_values('geolocation_state', ascending = False).head()
geolocation.query('geolocation_lat == -23.495901')
我的问题是:鉴于存在过滤条件中传递的值,为什么查询返回一个空的数据帧?
答案 0 :(得分:0)
问题是值是浮点数,因此由于精度问题,需要numpy.isclose
和boolean indexing
:
out = geolocation[np.isclose(geolocation['geolocation_lat'], -23.495901)]
print (out.head())
geolocation_zip_code_prefix geolocation_lat geolocation_lng \
19112 2020 -23.495993 -46.635616
19118 2020 -23.495960 -46.634081
19129 2020 -23.495861 -46.636183
19161 2044 -23.495681 -46.618947
19167 2084 -23.495675 -46.599478
geolocation_city geolocation_state
19112 sao paulo SP
19118 sao paulo SP
19129 sao paulo SP
19161 sao paulo SP
19167 sao paulo SP