我有个大问题。
我有一个这样的multiIndex数据框
time_total_x_x time_total_y perc_time time_total_x_y perc_sec_time
sector radiotap.mcs.index
2 1.0000000000 0.1079160312 0.1505082861 0.7170105646 18.0297726012 0.0083477640
2.0000000000 0.0101262961 0.1505082861 0.0672806552 18.0297726012 0.0083477640
3.0000000000 0.0074302504 0.1505082861 0.0493677164 18.0297726012 0.0083477640
4.0000000000 0.0057511342 0.1505082861 0.0382114125 18.0297726012 0.0083477640
6.0000000000 0.0053130805 0.1505082861 0.0353009170 18.0297726012 0.0083477640
7.0000000000 0.0056565361 0.1505082861 0.0375828883 18.0297726012 0.0083477640
8.0000000000 0.0083149576 0.1505082861 0.0552458459 18.0297726012 0.0083477640
3 1.0000000000 0.0326363429 0.0553721351 0.5894001165 18.0297726012 0.0030711499
3.0000000000 0.0037409247 0.0553721351 0.0675596971 18.0297726012 0.0030711499
6.0000000000 0.0013867221 0.0553721351 0.0250436808 18.0297726012 0.0030711499
8.0000000000 0.0097070545 0.0553721351 0.1753057659 18.0297726012 0.0030711499
,如果值不存在,我需要将值添加到索引radiotap.mcs.index
中,并为同一索引sector
的最后两列填充相同的值(但这并不重要)
应该看起来像这样
time_total_x_x time_total_y perc_time time_total_x_y perc_sec_time
sector radiotap.mcs.index
2 0.0000000000 0 0 0 18.0297726012 0.0083477640
1.0000000000 0.1079160312 0.1505082861 0.7170105646 18.0297726012 0.0083477640
2.0000000000 0.0101262961 0.1505082861 0.0672806552 18.0297726012 0.0083477640
3.0000000000 0.0074302504 0.1505082861 0.0493677164 18.0297726012 0.0083477640
5.0000000000 0 0 0 18.0297726012 0.0083477640
4.0000000000 0.0057511342 0.1505082861 0.0382114125 18.0297726012 0.0083477640
6.0000000000 0.0053130805 0.1505082861 0.0353009170 18.0297726012 0.0083477640
7.0000000000 0.0056565361 0.1505082861 0.0375828883 18.0297726012 0.0083477640
8.0000000000 0.0083149576 0.1505082861 0.0552458459 18.0297726012 0.0083477640
3 0.0000000000 0 0 0 18.0297726012 0.0030711499
1.0000000000 0.0326363429 0.0553721351 0.5894001165 18.0297726012 0.0030711499
2.0000000000 0 0 0 18.0297726012 0.0030711499
3.0000000000 0.0037409247 0.0553721351 0.0675596971 18.0297726012 0.0030711499
4.0000000000 0 0 0 18.0297726012 0.0030711499
5.0000000000 0 0 0 18.0297726012 0.0030711499
6.0000000000 0.0013867221 0.0553721351 0.0250436808 18.0297726012 0.0030711499
7.0000000000 0 0 0 18.0297726012 0.0030711499
8.0000000000 0.0097070545 0.0553721351 0.1753057659 18.0297726012 0.0030711499
有人可以帮助我吗?我真的需要很多。
答案 0 :(得分:0)
使用pd.MultiIndex.from_product
然后使用reindex
创建索引的产品:
df = pd.DataFrame({"sector":[2,2,2,2,2,2,2,2,3,3,3,3],
"idx":[1,2,3,4,5,6,7,8,1,3,6,8],
"values":range(12)})
m = pd.MultiIndex.from_product([df["sector"].unique(),df["idx"].unique()],names=["sector","idx"])
print (df.set_index(["sector","idx"]).reindex(m, fill_value=0))
values
sector idx
2 1 0
2 1
3 2
4 3
5 4
6 5
7 6
8 7
3 1 8
2 0
3 9
4 0
5 0
6 10
7 0
8 11
答案 1 :(得分:0)
itertools
创建丢失的行并加入原始数据框。然后,用fillna(method='bfill')
填充孔,将NA替换为0。
import pandas as pd
import numpy as np
import io
data = '''
sector radiotap.mcs.index time_total_x_x time_total_y perc_time time_total_x_y perc_sec_time
2 1.0000000000 0.1079160312 0.1505082861 0.7170105646 18.0297726012 0.0083477640
2 2.0000000000 0.0101262961 0.1505082861 0.0672806552 18.0297726012 0.0083477640
2 3.0000000000 0.0074302504 0.1505082861 0.0493677164 18.0297726012 0.0083477640
2 4.0000000000 0.0057511342 0.1505082861 0.0382114125 18.0297726012 0.0083477640
2 6.0000000000 0.0053130805 0.1505082861 0.0353009170 18.0297726012 0.0083477640
2 7.0000000000 0.0056565361 0.1505082861 0.0375828883 18.0297726012 0.0083477640
2 8.0000000000 0.0083149576 0.1505082861 0.0552458459 18.0297726012 0.0083477640
3 1.0000000000 0.0326363429 0.0553721351 0.5894001165 18.0297726012 0.0030711499
3 3.0000000000 0.0037409247 0.0553721351 0.0675596971 18.0297726012 0.0030711499
3 6.0000000000 0.0013867221 0.0553721351 0.0250436808 18.0297726012 0.0030711499
3 8.0000000000 0.0097070545 0.0553721351 0.1753057659 18.0297726012 0.0030711499
'''
df = pd.read_csv(io.StringIO(data), sep='\s+')
tmp = df[['sector', 'radiotap.mcs.index']].groupby('sector').agg([max, min])
from itertools import product
rng = np.arange(0, tmp.loc[:,('radiotap.mcs.index', 'max')].max()+1)
df1 = pd.DataFrame(list(product(tmp.index, rng)), columns=['sector','radiotap.mcs.index'])
df1 = df1.merge(df, on=['sector','radiotap.mcs.index'], how='outer')
df1.loc[:,['time_total_x_y', 'perc_sec_time']] = df1.loc[:,['time_total_x_y', 'perc_sec_time']].fillna(method='bfill')
df1.fillna(0, inplace=True)
df1
sector radiotap.mcs.index time_total_x_x time_total_y perc_time time_total_x_y perc_sec_time
0 2 0.0 0.000000 0.000000 0.000000 18.029773 0.008348
1 2 1.0 0.107916 0.150508 0.717011 18.029773 0.008348
2 2 2.0 0.010126 0.150508 0.067281 18.029773 0.008348
3 2 3.0 0.007430 0.150508 0.049368 18.029773 0.008348
4 2 4.0 0.005751 0.150508 0.038211 18.029773 0.008348
5 2 5.0 0.000000 0.000000 0.000000 18.029773 0.008348
6 2 6.0 0.005313 0.150508 0.035301 18.029773 0.008348
7 2 7.0 0.005657 0.150508 0.037583 18.029773 0.008348
8 2 8.0 0.008315 0.150508 0.055246 18.029773 0.008348
9 3 0.0 0.000000 0.000000 0.000000 18.029773 0.003071
10 3 1.0 0.032636 0.055372 0.589400 18.029773 0.003071
11 3 2.0 0.000000 0.000000 0.000000 18.029773 0.003071
12 3 3.0 0.003741 0.055372 0.067560 18.029773 0.003071
13 3 4.0 0.000000 0.000000 0.000000 18.029773 0.003071
14 3 5.0 0.000000 0.000000 0.000000 18.029773 0.003071
15 3 6.0 0.001387 0.055372 0.025044 18.029773 0.003071
16 3 7.0 0.000000 0.000000 0.000000 18.029773 0.003071
17 3 8.0 0.009707 0.055372 0.175306 18.029773 0.003071