我有一个包含 UserID 和 SharedNews 的数据框,我想计算每个用户拥有多少共享新闻。这是我的代码:
import pandas as pd
import numpy as np
...
def aggr_new_userlevel_shares_dataset():
new_userlevel_shares_df = new_userlevel_shares_dataset()
id_shared_df = new_userlevel_shares_df[["UserID","PostTitle"]].values
array_shared = []
for row in id_shared_df:
array_shared.append([row[0],sharedNews(row[1])])
shared_df = pd.DataFrame(array_shared,columns = ["UserIDTemp","SharedNews"])
concat_df = pd.concat([new_userlevel_shares_df,shared_df],axis = 1)
concat_df.drop("UserIDTemp",axis = 1,inplace = True)
print("before sum:")
print(concat_df)
concat_df = concat_df.groupby(["UserID"],sort = False).agg({"SharedNews",np.sum}).reset_index()
print("after sum:")
print(concat_df)
def sharedNews(post_title):
countSharedNews = 0
keywords = ['via', 'shared \'s', 'shared a', 'commented on', 'likes', 'published']
for i in keywords:
if (i in post_title and "photo" not in post_title) and (i in post_title and "video" not in post_title):
countSharedNews = 1
return countSharedNews
然而,它错误地用:
Traceback (most recent call last):
File "F:/MyDocument/F/My Document/Training/Python/PyCharmProject/FaceBookCrawl/FB_group_user_hierarchicalClustering.py", line 747, in <module>
aggr_new_userlevel_shares_dataset()
File "F:/MyDocument/F/My Document/Training/Python/PyCharmProject/FaceBookCrawl/FB_group_user_hierarchicalClustering.py", line 710, in aggr_new_userlevel_shares_dataset
concat_df = concat_df.groupby(["UserID"],sort = False).agg({"SharedNews",np.sum}).reset_index()
...
AttributeError: 'SeriesGroupBy' object has no attribute 'SharedNews'
您能否告诉我原因以及如何纠正?