将2个类别的Pandas数据帧中的值计入数据透视表

时间:2018-04-16 12:30:41

标签: python pandas pivot

Here是一个参考,我已经找到了做类似操作但不完全准确。

我拥有的是:
下面的数据帧。格式:

    Tweets                                                   Classified     FreqWord
     calm director day science meetings nasal talk cutting edge remote sensing research drought veg fluorescence calm love                 Positive drought
     love thought drought   Positive    drought
     reign mother kerr funny none tried make come back drought  Positive    drought
     wonder could help thai market b post reuters drought devastates south europe crops Negative    drought
     wonder could help thai market b post reuters drought devastates south europe crops Negative    crops
     wonder could help thai market b post reuters drought devastates south europe crops Negative    crops
     wonder could help thai market b post reuters drought devastates south europe crops Negative    business
     every child safe drinking water thank uk aid providing suppo ensure children rights drought    Positive    drought
     every child safe drinking water thank uk aid providing suppo ensure children rights drought    Positive    water

Dataframe

我需要的是:
数据框在数据透视表中,其中索引为Classified,列为FreqWord,值必须为出现次数的推文,分类在该常用字中。简而言之,像foll。

Classified  drought crops   business    water
Positive        5       0          0        1
Negative        1       2          1        0

另请注意
我有更多的“常用词”和“分类”这个数据集

1 个答案:

答案 0 :(得分:2)

你可以这样做:

pd.crosstab(df.Classified, df.FreqWord)

输出

FreqWord    business  crops  drought  water
Classified                                 
Negative           1      2        1      0
Positive           0      0        4      1

或get_dummies:

df_out = pd.get_dummies(df[['Classified','FreqWord']], columns=['FreqWord'])\
           .set_index('Classified').sum(level=0)
df_out.columns = df_out.columns.str.split('_').str[1]

输出:

            business  crops  drought  water
Classified                                 
Positive           0      0        4      1
Negative           1      2        1      0

并且,如果您希望可以reset_index:

df_out.reset_index()

  Classified  business  crops  drought  water
0   Positive         0      0        4      1
1   Negative         1      2        1      0