我正在尝试使用分块对大型数据集进行分组。
什么有效:
chunks = pd.read_stata('data.dta', chunksize = 50000, columns = ['year', 'race', 'app'])
pieces = [chunk.groupby(['race'])['app'].agg(['sum']) for chunk in chunks]
agg = pd.concat(pieces.groupby(level = 0).sum()
什么行不通(错误:Categorical objects has no attribute flags
)
chunks = pd.read_stata('data.dta', chunksize = 50000, columns = ['year', 'race', 'app'])
pieces = [chunk.groupby(['year', 'race'])['app'].agg(['sum']) for chunk in chunks]
agg = pd.concat(pieces.groupby(['year', 'race']).sum()
在year
中添加时我想到的是什么?
pieces
:
2013 Asian 9325
Black 2655
AmInd 118
Hisp 6371
White 16825
Other 2446
Unknown 3502
Foreign 7280
Name: app, dtype: float64, year race
2013 Asian 8884
Black 2969
AmInd 72
Hisp 3760
White 18926
Other 1843
Unknown 3262
Foreign 8183
Name: app, dtype: float64, year race
2013 Asian 6429
Black 2176
AmInd 89
Hisp 3804
White 13903
Other 1752
Unknown 2760
Foreign 6825
2014 Asian 1522
Black 738
AmInd 23
Hisp 1133
White 4243
Other 437
Unknown 316
Foreign 1997
Name: app, dtype: float64, year race