我有一个数据集,这是一个例子:
df = DataFrame({"Seconds_left":[5,10,15,25,30,35,5,10,15,30], "Team":["ATL","ATL","ATL","ATL","ATL","ATL","SAS","SAS","SAS","SAS"], "Fouls": [1,2,3,3,4,5,5,4,1,1]})
Fouls Seconds_left Team
0 1 5 ATL
1 2 10 ATL
2 3 15 ATL
3 3 25 ATL
4 4 30 ATL
5 5 35 ATL
6 5 5 SAS
7 4 10 SAS
8 1 15 SAS
9 1 30 SAS
现在我想插入缺少Seconds_left列中数据的行:
Id Fouls Seconds_left Team
0 1 5 ATL
1 2 10 ATL
2 3 15 ATL
3 NaN 20 ATL
4 3 25 ATL
5 4 30 ATL
6 5 35 ATL
7 5 5 SAS
8 4 10 SAS
9 1 15 SAS
10 NaN 20 SAS
11 NaN 25 SAS
12 1 30 SAS
13 NaN 35 SAS
我已经尝试过重建索引等,但很明显它不起作用,因为有重复索引。
有人知道如何解决这个问题吗?
谢谢!
答案 0 :(得分:4)
创建MultiIndex并重新索引+ reset_index:
idx = pd.MultiIndex.from_product([df['Team'].unique(),
np.arange(5, df['Seconds_left'].max()+1, 5)],
names=['Team', 'Seconds_left'])
df.set_index(['Team', 'Seconds_left']).reindex(idx).reset_index()
Out:
Team Seconds_left Fouls
0 ATL 5 1.0
1 ATL 10 2.0
2 ATL 15 3.0
3 ATL 20 NaN
4 ATL 25 3.0
5 ATL 30 4.0
6 ATL 35 5.0
7 SAS 5 5.0
8 SAS 10 4.0
9 SAS 15 1.0
10 SAS 20 NaN
11 SAS 25 NaN
12 SAS 30 1.0
13 SAS 35 NaN
答案 1 :(得分:1)
使用groupby
和merge
的方法:
df_left = pd.DataFrame({'Seconds_left':[5,10,15,20,25,30,35]})
df_out = df.groupby('Team', as_index=False).apply(lambda x: x.merge(df_left, how='right', on='Seconds_left'))
df_out['Team'] = df_out['Team'].fillna(method='ffill')
df_out = df_out.reset_index(drop=True).sort_values(by=['Team','Seconds_left'])
print(df_out)
输出:
Fouls Seconds_left Team
0 1.0 5 ATL
1 2.0 10 ATL
2 3.0 15 ATL
6 NaN 20 ATL
3 3.0 25 ATL
4 4.0 30 ATL
5 5.0 35 ATL
7 5.0 5 SAS
8 4.0 10 SAS
9 1.0 15 SAS
11 NaN 20 SAS
12 NaN 25 SAS
10 1.0 30 SAS
13 NaN 35 SAS
答案 2 :(得分:-1)
import pandas as pd
import numpy as np
df = pd.DataFrame(columns = ['a', 'b'])
df.loc[len(df)] = [1,np.NaN]