Question

我有一个数据框df，如下所示：

df
     ID        date          values
0     0     2017-01-05         55
1     0     2017-01-08         55
2     0     2017-01-09         33
3     1     2017-01-05         27
4     1     2017-01-08         78
5     1     2017-01-09         78

我想获得每个月和每个ID的最频繁的值，所以

df1
     ID    YearMonth   value
0    0      2017-01      55
1    1      2017-01      78

Answer 1

此解决方案是对注释中答案的改进。它与您的预期输出更加接近。

(df.groupby(['ID', df.date.dt.to_period('M')])
.values
.apply(lambda x: x.mode()[0])
.reset_index()
.rename({'date': 'YearMonth'}, axis=1)
)
   ID YearMonth  values
0   0   2017-01      55
1   1   2017-01      78

Answer 2

您可以创建一个年月列，然后分组

df['date'] = pd.to_datetime(df['date'])
df['YearMonth'] = df.date.dt.to_period('M')
df.groupby('ID')['YearMonth','values'].apply(lambda x: x.mode().iloc[0]).reset_index()


    ID  YearMonth   values
0   0   2017-01     55
1   1   2017-01     78

如果您希望通过ID和yearmonth获得最频繁的值，请将最后一行更改为

df.groupby(['ID', 'YearMonth'])['values'].apply(lambda x: x.mode()[0]).reset_index()

如果您无法将日期列转换为期间，请使用

df.groupby(['ID', df.date.dt.year.rename('Year'), df.date.dt.month.rename('Month')])['values'].apply(lambda x: x.mode()[0]).reset_index()

    ID  Year    Month   values
0   0   2017    1       55
1   1   2017    1       78

熊猫：如何获得每个ID在一个月内获得的最频繁的价值？

2 个答案: