我有一个单词列表'我想算在下面
word_list = ['one','two','three']
我在pandas数据框中有一个列,下面有文字。
TEXT | USER | ID
-------------------------------------------|---------|------
"Perhaps she'll be the one for me." | User 1 | 100
"Is it two or one?" | User 1 | 100
"Mayhaps it be three afterall..." | User 2 | 150
"Three times and it's a charm." | User 2 | 150
"One fish, two fish, red fish, blue fish." | User 2 | 150
"There's only one cat in the hat." | User 3 | 200
"One does not simply code into pandas." | User 3 | 200
"Two nights later..." | User 1 | 100
"Quoth the Raven... nevermore." | User 2 | 150
我想要的输出如下所示,我希望使用" TEXT"中的数据来计算与word_list中任何单词相关的文本的唯一用户数。柱。在统计了唯一身份用户之后,我还想计算与每条推文相关的关注者总数,并与该词的唯一用户数相关联。
Word | Unique User Count | ID Sum
one | 3 | 450
two | 2 | 250
three| 1 | 150
有没有办法在Python 2.7中执行此操作?
答案 0 :(得分:1)
我打破了步骤
df.columns=['TEXT','USER','ID']
df[word_list]=df.TEXT.str.lower().apply(lambda x : pd.Series([x.find(y) for y in word_list])).ne(-1)
df1=df[['USER','one','two','three','ID']].set_index(['USER','ID']).astype(int).replace({0:np.nan})
Target=df1.stack().reset_index().groupby('level_2').agg({'USER':lambda x : len(set(x)),'ID':lambda x : sum(set(x))})
Target=Target.reset_index()
Target.columns=['Word','Unique User Count','ID Sum']
Target
Out[97]:
Word Unique User Count ID Sum
0 one 3 450
1 three 1 150
2 two 2 250