我的目标是获取数据并根据特定列和特定类型插值缺失值 我实现了这个目标,但是在插值之前我很难回到数据框的形状。
data = [
{"type": "Car", "avg_speed": 30, "max_speed": 200},
{"type": "Car", "avg_speed": 20, "max_speed": 100},
{"type": "Car", "avg_speed": 25, "max_speed": None},
{"type": "Plane", "avg_speed": 300, "max_speed": 2000},
{"type": "Plane", "avg_speed": 200, "max_speed": 1000},
{"type": "Plane", "avg_speed": 250, "max_speed": None}
]
df = pd.DataFrame(data)
print(df)
post_interp = df.groupby("type").apply(lambda x: x.set_index(
'avg_speed').sort_index().interpolate(method='index'))
print(post_interp)
第一张照片:
type avg_speed max_speed
0 Car 30 200.0
1 Car 20 100.0
2 Car 25 NaN
3 Plane 300 2000.0
4 Plane 200 1000.0
5 Plane 250 NaN
第二次打印:
type max_speed
type avg_speed
Car 20 Car 100.0
25 Car 150.0
30 Car 200.0
Plane 200 Plane 1000.0
250 Plane 1500.0
300 Plane 2000.0
我想返回带有插值的打印1中数据框的形状。
答案 0 :(得分:2)
将group_keys=False
添加到DataFrame.groupby
以避免重复的索引,最后添加DataFrame.reset_index
:
post_interp = (df.groupby("type", group_keys=False)
.apply(lambda x: x.set_index('avg_speed')
.sort_index()
.interpolate(method='index'))
.reset_index())
另一个带有双reset_index
的解决方案:
post_interp = (df.groupby("type")
.apply(lambda x: x.set_index('avg_speed')
.sort_index()
.interpolate(method='index'))
.reset_index(level=0, drop=True)
.reset_index())
或者您可以在groupby
之前创建索引:
post_interp = (df.set_index('avg_speed')
.sort_index()
.groupby("type", group_keys=False)
.apply(lambda x: x.interpolate(method='index'))
.reset_index())
print(post_interp)
avg_speed type max_speed
0 20 Car 100.0
1 25 Car 150.0
2 30 Car 200.0
3 200 Plane 1000.0
4 250 Plane 1500.0
5 300 Plane 2000.0
最后必要时按相同的列顺序添加DataFrame.reindex
:
post_interp = post_interp.reindex(df.columns, axis=1)
print(post_interp)
type avg_speed max_speed
0 Car 20 100.0
1 Car 25 150.0
2 Car 30 200.0
3 Plane 200 1000.0
4 Plane 250 1500.0
5 Plane 300 2000.0