假设我有以下有关过去交易数量的信息,我按年份对它们进行分组:
import pandas as pd
import numpy as np
dates = pd.date_range('19990101', periods=6000)
df = pd.DataFrame(np.random.randint(0,50,size=(6000,2)), index = dates)
df.columns = ['winners','losers']
grouped = df.groupby(lambda x: x.year)
print grouped.sum()
如何在此“分组”数据中生成一列,以显示每年的获胜者百分比?还有另一栏显示每年最大连续亏损交易?
试图遵循这个例子Understanding groupby in pandas,但在我的案例中无法弄清楚如何按年完成。
答案 0 :(得分:1)
首先创建一个新的DataFrame,然后根据赢家和输家创建必要的列:
new_df = pd.DataFrame()
new_df ['winners'] = df.groupby(df.index.year, as_index=True)['winners'].sum()
new_df ['losers'] = df.groupby(df.index.year, as_index=True)['losers'].sum()
然后,你可以通过赢家,输家(像索引数据一样返回)来计算赢家,输家的百分比。
你可以这样做:
import pandas as pd
import numpy as np
dates = pd.date_range('19990101', periods=6000)
df = pd.DataFrame( np.random.randint(0,50,size=(6000,2)), index = dates)
df.columns = ['winners','losers']
new_df = pd.DataFrame()
new_df ['winners'] = df.groupby(df.index.year, as_index=True)['winners'].sum()
new_df ['losers'] = df.groupby(df.index.year, as_index=True)['losers'].sum()
new_df['winners_Percent'] = new_df['winners']/new_df['winners'].sum()
new_df['losers_Percent'] = new_df['losers']/new_df['losers'].sum()
输出: