如果不存在熊猫,则将值添加到multiIndex

时间:2020-09-15 01:05:47

标签: python pandas data-science

我有个大问题。

我有一个这样的multiIndex数据框

                                time_total_x_x  time_total_y    perc_time   time_total_x_y  perc_sec_time
sector  radiotap.mcs.index                  
2       1.0000000000            0.1079160312    0.1505082861    0.7170105646    18.0297726012   0.0083477640
        2.0000000000            0.0101262961    0.1505082861    0.0672806552    18.0297726012   0.0083477640
        3.0000000000            0.0074302504    0.1505082861    0.0493677164    18.0297726012   0.0083477640
        4.0000000000            0.0057511342    0.1505082861    0.0382114125    18.0297726012   0.0083477640
        6.0000000000            0.0053130805    0.1505082861    0.0353009170    18.0297726012   0.0083477640
        7.0000000000            0.0056565361    0.1505082861    0.0375828883    18.0297726012   0.0083477640
        8.0000000000            0.0083149576    0.1505082861    0.0552458459    18.0297726012   0.0083477640
3       1.0000000000            0.0326363429    0.0553721351    0.5894001165    18.0297726012   0.0030711499
        3.0000000000            0.0037409247    0.0553721351    0.0675596971    18.0297726012   0.0030711499
        6.0000000000            0.0013867221    0.0553721351    0.0250436808    18.0297726012   0.0030711499
        8.0000000000            0.0097070545    0.0553721351    0.1753057659    18.0297726012   0.0030711499

,如果值不存在,我需要将值添加到索引radiotap.mcs.index中,并为同一索引sector的最后两列填充相同的值(但这并不重要)

应该看起来像这样

                                time_total_x_x  time_total_y    perc_time       time_total_x_y  perc_sec_time
sector  radiotap.mcs.index                  
2       0.0000000000            0               0               0               18.0297726012   0.0083477640
        1.0000000000            0.1079160312    0.1505082861    0.7170105646    18.0297726012   0.0083477640
        2.0000000000            0.0101262961    0.1505082861    0.0672806552    18.0297726012   0.0083477640
        3.0000000000            0.0074302504    0.1505082861    0.0493677164    18.0297726012   0.0083477640
        5.0000000000            0               0               0               18.0297726012   0.0083477640
        4.0000000000            0.0057511342    0.1505082861    0.0382114125    18.0297726012   0.0083477640
        6.0000000000            0.0053130805    0.1505082861    0.0353009170    18.0297726012   0.0083477640
        7.0000000000            0.0056565361    0.1505082861    0.0375828883    18.0297726012   0.0083477640
        8.0000000000            0.0083149576    0.1505082861    0.0552458459    18.0297726012   0.0083477640
3       0.0000000000            0               0               0               18.0297726012   0.0030711499
        1.0000000000            0.0326363429    0.0553721351    0.5894001165    18.0297726012   0.0030711499
        2.0000000000            0               0               0               18.0297726012   0.0030711499
        3.0000000000            0.0037409247    0.0553721351    0.0675596971    18.0297726012   0.0030711499
        4.0000000000            0               0               0               18.0297726012   0.0030711499
        5.0000000000            0               0               0               18.0297726012   0.0030711499
        6.0000000000            0.0013867221    0.0553721351    0.0250436808    18.0297726012   0.0030711499
        7.0000000000            0               0               0               18.0297726012   0.0030711499
        8.0000000000            0.0097070545    0.0553721351    0.1753057659    18.0297726012   0.0030711499

有人可以帮助我吗?我真的需要很多。

2 个答案:

答案 0 :(得分:0)

使用pd.MultiIndex.from_product然后使用reindex创建索引的产品:

df = pd.DataFrame({"sector":[2,2,2,2,2,2,2,2,3,3,3,3],
                   "idx":[1,2,3,4,5,6,7,8,1,3,6,8],
                   "values":range(12)})

m = pd.MultiIndex.from_product([df["sector"].unique(),df["idx"].unique()],names=["sector","idx"])

print (df.set_index(["sector","idx"]).reindex(m, fill_value=0))

            values
sector idx        
2      1         0
       2         1
       3         2
       4         3
       5         4
       6         5
       7         6
       8         7
3      1         8
       2         0
       3         9
       4         0
       5         0
       6        10
       7         0
       8        11

答案 1 :(得分:0)

itertools创建丢失的行并加入原始数据框。然后,用fillna(method='bfill')填充孔,将NA替换为0。

import pandas as pd
import numpy as np
import io

data = '''
sector radiotap.mcs.index time_total_x_x time_total_y perc_time time_total_x_y perc_sec_time
2 1.0000000000 0.1079160312 0.1505082861 0.7170105646 18.0297726012 0.0083477640
2 2.0000000000 0.0101262961 0.1505082861 0.0672806552 18.0297726012 0.0083477640
2 3.0000000000 0.0074302504 0.1505082861 0.0493677164 18.0297726012 0.0083477640
2 4.0000000000 0.0057511342 0.1505082861 0.0382114125 18.0297726012 0.0083477640
2 6.0000000000 0.0053130805 0.1505082861 0.0353009170 18.0297726012 0.0083477640
2 7.0000000000 0.0056565361 0.1505082861 0.0375828883 18.0297726012 0.0083477640
2 8.0000000000 0.0083149576 0.1505082861 0.0552458459 18.0297726012 0.0083477640
3 1.0000000000 0.0326363429 0.0553721351 0.5894001165 18.0297726012 0.0030711499
3 3.0000000000 0.0037409247 0.0553721351 0.0675596971 18.0297726012 0.0030711499
3 6.0000000000 0.0013867221 0.0553721351 0.0250436808 18.0297726012 0.0030711499
3 8.0000000000 0.0097070545 0.0553721351 0.1753057659 18.0297726012 0.0030711499
'''

df = pd.read_csv(io.StringIO(data), sep='\s+')
tmp = df[['sector', 'radiotap.mcs.index']].groupby('sector').agg([max, min])
from itertools import product
rng = np.arange(0, tmp.loc[:,('radiotap.mcs.index', 'max')].max()+1)
df1 = pd.DataFrame(list(product(tmp.index, rng)), columns=['sector','radiotap.mcs.index'])
df1 = df1.merge(df, on=['sector','radiotap.mcs.index'], how='outer')
df1.loc[:,['time_total_x_y', 'perc_sec_time']] = df1.loc[:,['time_total_x_y', 'perc_sec_time']].fillna(method='bfill')
df1.fillna(0, inplace=True)
df1
    sector  radiotap.mcs.index  time_total_x_x  time_total_y    perc_time   time_total_x_y  perc_sec_time
0   2   0.0 0.000000    0.000000    0.000000    18.029773   0.008348
1   2   1.0 0.107916    0.150508    0.717011    18.029773   0.008348
2   2   2.0 0.010126    0.150508    0.067281    18.029773   0.008348
3   2   3.0 0.007430    0.150508    0.049368    18.029773   0.008348
4   2   4.0 0.005751    0.150508    0.038211    18.029773   0.008348
5   2   5.0 0.000000    0.000000    0.000000    18.029773   0.008348
6   2   6.0 0.005313    0.150508    0.035301    18.029773   0.008348
7   2   7.0 0.005657    0.150508    0.037583    18.029773   0.008348
8   2   8.0 0.008315    0.150508    0.055246    18.029773   0.008348
9   3   0.0 0.000000    0.000000    0.000000    18.029773   0.003071
10  3   1.0 0.032636    0.055372    0.589400    18.029773   0.003071
11  3   2.0 0.000000    0.000000    0.000000    18.029773   0.003071
12  3   3.0 0.003741    0.055372    0.067560    18.029773   0.003071
13  3   4.0 0.000000    0.000000    0.000000    18.029773   0.003071
14  3   5.0 0.000000    0.000000    0.000000    18.029773   0.003071
15  3   6.0 0.001387    0.055372    0.025044    18.029773   0.003071
16  3   7.0 0.000000    0.000000    0.000000    18.029773   0.003071
17  3   8.0 0.009707    0.055372    0.175306    18.029773   0.003071