我遇到一个困扰数据的问题。我正在尝试将数据分组到指定的bin中,并取一个总和的平均值:
import pandas as pd
import numpy as np
df = pd.DataFrame( data = {'year':np.arange(1800,2000,1),'var1':np.random.randint(0,20,200),'var2':np.random.randint(0,20,200)})
thresholds = np.arange(0,20,1)
bins = pd.cut(df.var2, thresholds)
grouped = df.groupby(['year', bins]).count()
grouped = grouped.fillna(0)
grouped = grouped.assign(Num_Events = grouped.groupby('var1').var2.cumsum())
grouped = grouped.unstack()
我想对每个bin(即列)在Num_Events
'的索引中指定的所有日历年中取grouped
的平均值。 grouped['Num_Events'].head()
看起来像:
var2 (0, 1] (1, 2] (2, 3] (3, 4] ... (15, 16] (16, 17] (17, 18] (18, 19]
year ...
1800 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
1801 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
1802 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 2.0
1803 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
1804 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
我想要的输出看起来像
var2 (0, 1] (1, 2] (2, 3] (3, 4] ... (15, 16] (16, 17] (17, 18] (18, 19]
year ...
1800 <avg bin [0,1]> <avg bin [1,2]> <avg bin [2,3]> <avg bin [3,4]> ... <avg bin [15,16]> <avg bin [16,17]> <avg bin [17,8]> <avg bin [18,19]>
谢谢!