我有一个数据框,例如
Year Age Count
1999 0 80
1 80
2 80
3 80
4 90
5 100
...
2000 0 60
....
我想将年龄分组在不同的范围内,例如[0,5),[5,10],......并获得这些范围的相关总数。所以上面会变成
Year Age Count
1999 0-4 410
5-9 ...
...
2000 0-4 ...
...
使用groupby
和sum
有一种简单的方法吗?
答案 0 :(得分:0)
您可以使用pd.cut()
(如@MaxU建议的那样)制作中间Exception in thread "main" java.lang.UnsupportedOperationException
at org.nd4j.linalg.api.complex.BaseComplexNDArray.putScalar(BaseComplexNDArray.java:1947)
at org.nd4j.linalg.api.complex.BaseComplexNDArray.putScalar(BaseComplexNDArray.java:1804)
at org.nd4j.linalg.api.complex.BaseComplexNDArray.copyFromReal(BaseComplexNDArray.java:545)
at org.nd4j.linalg.api.complex.BaseComplexNDArray.<init>(BaseComplexNDArray.java:159)
at org.nd4j.linalg.api.complex.BaseComplexNDArray.<init>(BaseComplexNDArray.java:167)
at org.nd4j.linalg.cpu.nativecpu.complex.ComplexNDArray.<init>(ComplexNDArray.java:104)
at org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory.createComplex(CpuNDArrayFactory.java:166)
at org.nd4j.linalg.factory.Nd4j.createComplex(Nd4j.java:3345)
at org.nd4j.linalg.convolution.DefaultConvolutionInstance.convn(DefaultConvolutionInstance.java:116)
at org.nd4j.linalg.convolution.BaseConvolution.convn(BaseConvolution.java:66)
at com.example.demo.Main.testing(Main.java:41)
at com.example.demo.Main.main(Main.java:34)
列:
Age_Range
cut_points = range(0, df.Age.max() + 5, 5)
df['Age_Range'] = pd.cut(df.Age, cut_points)
df.groupby(['Year', 'Age_Range'])['Count'].sum()
函数为range()
创建切割点,介于0和最大值之间,加上5,增量为5。