Pandas - 插入缺少数据的行

时间:2017-04-28 21:11:12

标签: python pandas

我有一个数据集,这是一个例子:

df = DataFrame({"Seconds_left":[5,10,15,25,30,35,5,10,15,30], "Team":["ATL","ATL","ATL","ATL","ATL","ATL","SAS","SAS","SAS","SAS"], "Fouls": [1,2,3,3,4,5,5,4,1,1]})


   Fouls  Seconds_left Team
0      1             5  ATL
1      2            10  ATL
2      3            15  ATL
3      3            25  ATL
4      4            30  ATL
5      5            35  ATL
6      5             5  SAS
7      4            10  SAS
8      1            15  SAS
9      1            30  SAS

现在我想插入缺少Seconds_left列中数据的行:

Id Fouls Seconds_left   Team
0      1            5    ATL
1      2           10    ATL
2      3           15    ATL
3    NaN           20    ATL
4      3           25    ATL
5      4           30    ATL
6      5           35    ATL
7      5            5    SAS
8      4           10    SAS
9      1           15    SAS
10   NaN           20    SAS
11   NaN           25    SAS
12     1           30    SAS
13   NaN           35    SAS

我已经尝试过重建索引等,但很明显它不起作用,因为有重复索引。

有人知道如何解决这个问题吗?

谢谢!

3 个答案:

答案 0 :(得分:4)

创建MultiIndex并重新索引+ reset_index:

idx = pd.MultiIndex.from_product([df['Team'].unique(), 
                                  np.arange(5, df['Seconds_left'].max()+1, 5)],
                                 names=['Team', 'Seconds_left'])

df.set_index(['Team', 'Seconds_left']).reindex(idx).reset_index()
Out: 
   Team  Seconds_left  Fouls
0   ATL             5    1.0
1   ATL            10    2.0
2   ATL            15    3.0
3   ATL            20    NaN
4   ATL            25    3.0
5   ATL            30    4.0
6   ATL            35    5.0
7   SAS             5    5.0
8   SAS            10    4.0
9   SAS            15    1.0
10  SAS            20    NaN
11  SAS            25    NaN
12  SAS            30    1.0
13  SAS            35    NaN

答案 1 :(得分:1)

使用groupbymerge的方法:

df_left = pd.DataFrame({'Seconds_left':[5,10,15,20,25,30,35]})

df_out = df.groupby('Team', as_index=False).apply(lambda x: x.merge(df_left, how='right', on='Seconds_left'))

df_out['Team'] = df_out['Team'].fillna(method='ffill')

df_out = df_out.reset_index(drop=True).sort_values(by=['Team','Seconds_left'])

print(df_out)

输出:

    Fouls  Seconds_left Team
0     1.0             5  ATL
1     2.0            10  ATL
2     3.0            15  ATL
6     NaN            20  ATL
3     3.0            25  ATL
4     4.0            30  ATL
5     5.0            35  ATL
7     5.0             5  SAS
8     4.0            10  SAS
9     1.0            15  SAS
11    NaN            20  SAS
12    NaN            25  SAS
10    1.0            30  SAS
13    NaN            35  SAS

答案 2 :(得分:-1)

import pandas as pd
import numpy as np


df = pd.DataFrame(columns = ['a', 'b'])

df.loc[len(df)] = [1,np.NaN]