大熊猫从groupBy

时间:2020-01-04 16:34:48

标签: python pandas pandas-groupby

我的目标是获取数据并根据特定列和特定类型插值缺失值 我实现了这个目标,但是在插值之前我很难回到数据框的形状。

data = [
    {"type": "Car", "avg_speed": 30, "max_speed": 200},
    {"type": "Car", "avg_speed": 20, "max_speed": 100},
    {"type": "Car", "avg_speed": 25, "max_speed": None},
    {"type": "Plane", "avg_speed": 300, "max_speed": 2000},
    {"type": "Plane", "avg_speed": 200, "max_speed": 1000},
    {"type": "Plane", "avg_speed": 250, "max_speed": None}
]


df = pd.DataFrame(data)
print(df)
post_interp = df.groupby("type").apply(lambda x: x.set_index(
    'avg_speed').sort_index().interpolate(method='index'))
print(post_interp)

第一张照片:

    type  avg_speed  max_speed
0    Car         30      200.0
1    Car         20      100.0
2    Car         25        NaN
3  Plane        300     2000.0
4  Plane        200     1000.0
5  Plane        250        NaN

第二次打印:

                  type  max_speed
type  avg_speed
Car   20           Car      100.0
      25           Car      150.0
      30           Car      200.0
Plane 200        Plane     1000.0
      250        Plane     1500.0
      300        Plane     2000.0

我想返回带有插值的打印1中数据框的形状。

1 个答案:

答案 0 :(得分:2)

group_keys=False添加到DataFrame.groupby以避免重复的索引,最后添加DataFrame.reset_index

post_interp = (df.groupby("type", group_keys=False)
                 .apply(lambda x: x.set_index('avg_speed')
                                   .sort_index()
                                   .interpolate(method='index'))
                 .reset_index())

另一个带有双reset_index的解决方案:

post_interp = (df.groupby("type")
                 .apply(lambda x: x.set_index('avg_speed')
                                   .sort_index()
                                   .interpolate(method='index'))
                 .reset_index(level=0, drop=True)
                 .reset_index())

或者您可以在groupby之前创建索引:

post_interp = (df.set_index('avg_speed')
                 .sort_index()
                 .groupby("type", group_keys=False)
                 .apply(lambda x: x.interpolate(method='index'))
                 .reset_index())
print(post_interp)
   avg_speed   type  max_speed
0         20    Car      100.0
1         25    Car      150.0
2         30    Car      200.0
3        200  Plane     1000.0
4        250  Plane     1500.0
5        300  Plane     2000.0

最后必要时按相同的列顺序添加DataFrame.reindex

post_interp = post_interp.reindex(df.columns, axis=1)
print(post_interp)
    type  avg_speed  max_speed
0    Car         20      100.0
1    Car         25      150.0
2    Car         30      200.0
3  Plane        200     1000.0
4  Plane        250     1500.0
5  Plane        300     2000.0