Question

df_headlines =

我想按date列分组，然后计算按日期出现-1，0和1的次数，然后以计数最高的那个为准，将其用作daily_score。

我从groupby开始：

df_group = df_headlines.groupby('date')

这将返回一个groupby对象，鉴于上面我想做的事情，我不确定如何使用它：

我可以使用以下方法遍历此吗？：

for index, row in df_group.iterrows():
    daily_pos = []
    daily_neg = []
    daily_neu = []

Answer 1

如Ch3steR所暗示的那样，您可以通过以下方式遍历您的组：

for name, group in headlines.groupby('date'):
    daily_pos = len(group[group['score'] == 1])
    daily_neg = len(group[group['score'] == -1])
    daily_neu = len(group[group['score'] == 0])

print(name, daily_pos, daily_neg, daily_neu)

对于每次迭代，变量name将包含date列中的值（例如4/13 / 20、4 / 14 / 20、5 / 13/20），并且该变量group将包含date变量中包含的name所有行的数据框。

Answer 2

尝试：

df_headlines.groupby("date")["score"].nlargest(1).reset_index(level=1, drop=True)

无需循环-您将在每个组中获得最常见的score

遍历熊猫数据框或groupby对象

2 个答案: