Here是一个参考,我已经找到了做类似操作但不完全准确。
我拥有的是:
下面的数据帧。格式:
Tweets Classified FreqWord
calm director day science meetings nasal talk cutting edge remote sensing research drought veg fluorescence calm love Positive drought
love thought drought Positive drought
reign mother kerr funny none tried make come back drought Positive drought
wonder could help thai market b post reuters drought devastates south europe crops Negative drought
wonder could help thai market b post reuters drought devastates south europe crops Negative crops
wonder could help thai market b post reuters drought devastates south europe crops Negative crops
wonder could help thai market b post reuters drought devastates south europe crops Negative business
every child safe drinking water thank uk aid providing suppo ensure children rights drought Positive drought
every child safe drinking water thank uk aid providing suppo ensure children rights drought Positive water
我需要的是:
数据框在数据透视表中,其中索引为Classified
,列为FreqWord
,值必须为出现次数的推文,分类在该常用字中。简而言之,像foll。
Classified drought crops business water
Positive 5 0 0 1
Negative 1 2 1 0
另请注意
我有更多的“常用词”和“分类”这个数据集
答案 0 :(得分:2)
你可以这样做:
pd.crosstab(df.Classified, df.FreqWord)
输出
FreqWord business crops drought water
Classified
Negative 1 2 1 0
Positive 0 0 4 1
或get_dummies:
df_out = pd.get_dummies(df[['Classified','FreqWord']], columns=['FreqWord'])\
.set_index('Classified').sum(level=0)
df_out.columns = df_out.columns.str.split('_').str[1]
输出:
business crops drought water
Classified
Positive 0 0 4 1
Negative 1 2 1 0
并且,如果您希望可以reset_index:
df_out.reset_index()
Classified business crops drought water
0 Positive 0 0 4 1
1 Negative 1 2 1 0