我想按特定列(id)对值进行分组,并用与给定ID相关联的最大日期时间替换所有值。
这是我编写的代码(无效)
file.groupby('data__id')['data__answered_at'].apply(lambda x: x['data__answered_at'] == x['data__answered_at'].max())
这是我的数据框示例
data__id data__answered_at
1 2019-01-10
1 Na
2 2019-01-12
2 Na
3 Na
4 Na
4 Na
5 Na
5 2019-01-15
答案 0 :(得分:1)
使用to_datetime
和errors='coerce'
将非日期时间替换为NaT
,然后使用GroupBy.transform
获得每组的最大值,因此可以用Series.fillna
替换缺失的值:
df['data__answered_at'] = pd.to_datetime(df['data__answered_at'], errors='coerce')
s = df.groupby('data__id')['data__answered_at'].transform('max')
df['data__answered_at'] = df['data__answered_at'].fillna(s)
print (df)
data__id data__answered_at
0 1 2019-01-10
1 1 2019-01-10
2 2 2019-01-12
3 2 2019-01-12
4 3 NaT
5 4 NaT
6 4 NaT
7 5 2019-01-15
8 5 2019-01-15
您的解决方案应使用lambda函数和fillna
重写:
f = lambda x: x.fillna(x.max())
df['data__answered_at'] = df.groupby('data__id')['data__answered_at'].apply(f)