Groupby和条件替换

时间:2019-07-18 11:47:02

标签: python pandas

我想按特定列(id)对值进行分组,并用与给定ID相关联的最大日期时间替换所有值。

这是我编写的代码(无效)

file.groupby('data__id')['data__answered_at'].apply(lambda x: x['data__answered_at'] == x['data__answered_at'].max())

这是我的数据框示例

data__id     data__answered_at
1              2019-01-10
1                  Na 
2              2019-01-12
2                  Na
3                  Na
4                  Na
4                  Na
5                  Na
5              2019-01-15   

1 个答案:

答案 0 :(得分:1)

使用to_datetimeerrors='coerce'将非日期时间替换为NaT,然后使用GroupBy.transform获得每组的最大值,因此可以用Series.fillna替换缺失的值:

df['data__answered_at'] = pd.to_datetime(df['data__answered_at'], errors='coerce')

s = df.groupby('data__id')['data__answered_at'].transform('max')
df['data__answered_at'] = df['data__answered_at'].fillna(s)
print (df)
   data__id data__answered_at
0         1        2019-01-10
1         1        2019-01-10
2         2        2019-01-12
3         2        2019-01-12
4         3               NaT
5         4               NaT
6         4               NaT
7         5        2019-01-15
8         5        2019-01-15

您的解决方案应使用lambda函数和fillna重写:

f = lambda x: x.fillna(x.max())
df['data__answered_at'] = df.groupby('data__id')['data__answered_at'].apply(f)