所以,我遇到了一个有趣的条形图
我找到了underlying data here,并且我正尝试重新创建如何将数据按范围分类(我使用过pd.cut
)和国家/地区进行分组。
这是到目前为止我尝试过的代码,但出现错误,(错误的)行已被注释掉
import pandas as pd
## csv file in zip http://ec.europa.eu/eurostat/cache/GISCO/geodatafiles/GEOSTAT-grid-POP-1K-2011-V2-0-1.zip
url="C:/Users/Simon/Downloads/GEOSTAT-grid-POP-1K-2011-V2-0-1/Version 2_0_1/GEOSTAT_grid_POP_1K_2011_V2_0_1.csv"
whole=pd.read_csv(url, low_memory=False)
populationDensity=whole[['TOT_P','CNTR_CODE']]
## trying to replicate graph here http://www.centreforcities.org/wp-content/uploads/2018/04/18-04-16-Square-kilometre-units-of-land-by-population.png
## which aggregates the records by brackets
# https://stackoverflow.com/questions/25010215/pandas-groupby-how-to-compute-counts-in-ranges#answer-25010952
ranges = [0,10000,15000,20000,25000,30000,35000,40000,45000,1000000]
bins=pd.cut(populationDensity['TOT_P'],ranges)
#print(bins)
## the following fails with error :
## AttributeError: Cannot access callable attribute 'groupby' of 'DataFrameGroupBy' objects, try using the 'apply' method
#print (populationDensity.groupby(['CNTR_CODE']).groupby(bins).count())
## the following fails with error :
## TypeError: 'Series' objects are mutable, thus they cannot be hashed
print (populationDensity.groupby(['CNTR_CODE'],pd.cut(populationDensity['TOT_P'],ranges)).count())
#relevant https://stackoverflow.com/questions/21441259/pandas-groupby-range-of-values#answer-21441621
我才刚刚开始使用熊猫。如果有人知道,我明天将再试一次。
答案 0 :(得分:1)
更改:
print (populationDensity.groupby(['CNTR_CODE'],pd.cut(populationDensity['TOT_P'],ranges)).count())
到
print (populationDensity.groupby(['CNTR_CODE', pd.cut(populationDensity['TOT_P'],ranges)]).count())
^ ^
因为groupby
参数by
使用多个列名称,组合列名称和Series或list
中的多个Series:
依据:映射,功能,标签或标签列表
用于确定分组依据的分组。如果by是函数,则会在对象索引的每个值上调用它。如果传递了dict或Series,则将使用Series或dict VALUES来确定组(将Series的值首先对齐;请参见.align()方法)。如果传递了ndarray,则按原样使用这些值来确定组。标签或标签列表可以按自身中的列传递给分组。注意,元组被解释为(单个)键。