如何添加' Sum'通过数据框到熊猫组的列? 我想做一个' Sum'在'看跌'和'看涨'下面的groupby数据帧的内部列。
然后我想添加另外两列:
%看跌=看跌/总和* 100
%看涨=看涨/总和* 100
group_df = df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).count()
group_df = group_df.unstack()
message
sentiment Bearish Bullish
created
2017-08-01 23:00:00 2.0 2.0
2017-08-02 00:00:00 1.0 3.0
2017-08-02 01:00:00 NaN 4.0
答案 0 :(得分:1)
您可以将concat
与新DataFrame
:
idx = pd.date_range('2017-08-01 23:13:00', periods=12, freq='12T')
df = pd.DataFrame({'message':[1,1,2,2,2,2,2,2,3,3,3,3],
'sentiment':['Bearish'] * 5 + ['Bullish'] * 7 }, index=idx)
print (df)
message sentiment
2017-08-01 23:13:00 1 Bearish
2017-08-01 23:25:00 1 Bearish
2017-08-01 23:37:00 2 Bearish
2017-08-01 23:49:00 2 Bearish
2017-08-02 00:01:00 2 Bearish
2017-08-02 00:13:00 2 Bullish
2017-08-02 00:25:00 2 Bullish
2017-08-02 00:37:00 2 Bullish
2017-08-02 00:49:00 3 Bullish
2017-08-02 01:01:00 3 Bullish
2017-08-02 01:13:00 3 Bullish
2017-08-02 01:25:00 3 Bullish
group_df =df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).count()
#add ['message'] for remove Multiindex in columns
group_df = group_df['message'].unstack()
#divide by sum
#add prefix - https://stackoverflow.com/q/45453508/2901002
df1 = group_df.div(group_df.sum()).mul(100).add_prefix('%%')
print (df1)
%Bearish %Bullish
2017-08-01 23:00:00 80.0 NaN
2017-08-02 00:00:00 20.0 57.142857
2017-08-02 01:00:00 NaN 42.857143
df = pd.concat([group_df, df1], axis=1)
print (df)
Bearish Bullish %Bearish %Bullish
2017-08-01 23:00:00 4.0 NaN 80.0 NaN
2017-08-02 00:00:00 1.0 4.0 20.0 57.142857
2017-08-02 01:00:00 NaN 3.0 NaN 42.857143
如果需要GroupBy.size
:
group_df = df[['sentiment','message']].groupby([pd.TimeGrouper(freq='H'),'sentiment']).size()
group_df = group_df.unstack()
df1 = group_df.div(group_df.sum()).mul(100).add_prefix('%%')
print (df1)
%Bearish %Bullish
2017-08-01 23:00:00 80.0 NaN
2017-08-02 00:00:00 20.0 57.142857
2017-08-02 01:00:00 NaN 42.857143
df = pd.concat([group_df, df1], axis=1)
print (df)
Bearish Bullish %Bearish %Bullish
2017-08-01 23:00:00 4.0 NaN 80.0 NaN
2017-08-02 00:00:00 1.0 4.0 20.0 57.142857
2017-08-02 01:00:00 NaN 3.0 NaN 42.857143