python计算数据帧agg

时间:2017-07-16 17:51:10

标签: python pandas

我有一个包含 UserID SharedNews 的数据框,我想计算每个用户拥有多少共享新闻。这是我的代码:

import pandas as pd
import numpy as np
...

def aggr_new_userlevel_shares_dataset():
    new_userlevel_shares_df = new_userlevel_shares_dataset()
    id_shared_df = new_userlevel_shares_df[["UserID","PostTitle"]].values
    array_shared = []

    for row in id_shared_df:
        array_shared.append([row[0],sharedNews(row[1])])

    shared_df = pd.DataFrame(array_shared,columns = ["UserIDTemp","SharedNews"])
    concat_df = pd.concat([new_userlevel_shares_df,shared_df],axis = 1)
    concat_df.drop("UserIDTemp",axis = 1,inplace = True)
    print("before sum:")
    print(concat_df)

    concat_df = concat_df.groupby(["UserID"],sort = False).agg({"SharedNews",np.sum}).reset_index()
    print("after sum:")
    print(concat_df)

def sharedNews(post_title):
    countSharedNews = 0
    keywords = ['via', 'shared \'s', 'shared a', 'commented on', 'likes', 'published']
    for i in keywords:
        if (i in post_title and "photo" not in post_title) and (i in post_title and "video" not in post_title):
            countSharedNews = 1
    return countSharedNews 

然而,它错误地用:

 Traceback (most recent call last):
  File "F:/MyDocument/F/My Document/Training/Python/PyCharmProject/FaceBookCrawl/FB_group_user_hierarchicalClustering.py", line 747, in <module>
    aggr_new_userlevel_shares_dataset()
  File "F:/MyDocument/F/My Document/Training/Python/PyCharmProject/FaceBookCrawl/FB_group_user_hierarchicalClustering.py", line 710, in aggr_new_userlevel_shares_dataset
    concat_df = concat_df.groupby(["UserID"],sort = False).agg({"SharedNews",np.sum}).reset_index()

    ...
    AttributeError: 'SeriesGroupBy' object has no attribute 'SharedNews'

您能否告诉我原因以及如何纠正?

0 个答案:

没有答案