按年份分组并生成新列

时间:2015-09-02 17:32:55

标签: python pandas

假设我有以下有关过去交易数量的信息,我按年份对它们进行分组:

import pandas as pd
import numpy as np

dates = pd.date_range('19990101', periods=6000)
df = pd.DataFrame(np.random.randint(0,50,size=(6000,2)), index = dates)
df.columns = ['winners','losers']
grouped = df.groupby(lambda x: x.year)
print grouped.sum()

如何在此“分组”数据中生成一列,以显示每年的获胜者百分比?还有另一栏显示每年最大连续亏损交易?

试图遵循这个例子Understanding groupby in pandas,但在我的案例中无法弄清楚如何按年完成。

1 个答案:

答案 0 :(得分:1)

首先创建一个新的DataFrame,然后根据赢家和输家创建必要的列:

new_df = pd.DataFrame()
new_df ['winners'] = df.groupby(df.index.year, as_index=True)['winners'].sum()
new_df ['losers'] = df.groupby(df.index.year, as_index=True)['losers'].sum()

然后,你可以通过赢家,输家(像索引数据一样返回)来计算赢家,输家的百分比。

你可以这样做:

import pandas as pd
import numpy as np

dates = pd.date_range('19990101', periods=6000)
df = pd.DataFrame( np.random.randint(0,50,size=(6000,2)), index = dates)
df.columns = ['winners','losers']
new_df = pd.DataFrame()
new_df ['winners'] = df.groupby(df.index.year, as_index=True)['winners'].sum()
new_df ['losers'] = df.groupby(df.index.year, as_index=True)['losers'].sum()
new_df['winners_Percent'] = new_df['winners']/new_df['winners'].sum()
new_df['losers_Percent'] = new_df['losers']/new_df['losers'].sum()

输出:

enter image description here