如何在数据框部分使用熊猫遮罩方法

时间:2020-01-02 10:17:31

标签: python pandas

数据帧的构造如下,但有更多对象(结构为object1列),我想删除(np.nan替换)var1,{{ 1}},var2(小于0或大于100)。

var3

我试图在此处使用example_df = {('meta_info', 'time'): {0: 2100, 1: 2200, 2: 2300, 3: 2400, 4: 100}, ('meta_info', 'counter'): {0: 0.0, 1: 1.0, 2: 2.0, 3: 3.0, 4: 4.0}, ('meta_info', 'measurement_id'): {0: 1, 1: 1, 2: 1, 3: 1, 4: 1}, ('object1', 'grp'): {0: '0', 1: '0', 2: '0', 3: '0', 4: '0'}, ('object1', 'id'): {0: '376690', 1: '376690', 2: '376690', 3: '376690', 4: '376690'}, ('object1', 'var1'): {0: 34.405149821218195, 1: 25.047388024508773, 2: 94.12283547956514, 3: -38.34383022173205, 4: 60.15259222044418}, ('object1', 'var2'): {0: 40.470001220703125, 1: 40.369998931884766, 2: 40.277000427246094, 3: 40.18899917602539, 4: 40.10200119018555}, ('object', 'var1'): {0: -4.453429468309658, 1: 82.84217089703611, 2: 145.2084949734712, 3: 79.83440766416545, 4: 87.39526160763526}, ('object', 'var2'): {0: 34.0, 1: 33.70000076293945, 2: 33.900001525878906, 3: 34.0, 4: 34.0}, ('object', 'var3'): {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0}} example_df = pd.DataFrame(example_df) ,但是以下代码失败:

pd.DataFrame.mask

,因为它删除了idx = pd.IndexSlice z = example_df.loc[:, idx[:, ['var1', 'var2', 'var3']]] example_df.mask((z < 0) | (z > 100)) 内其他列的所有值。

这很混乱,因为文档说:

cond:布尔型Series / DataFrame,类似于数组或可调用 如果z为False,请保留原始值。哪里 是的,请用cond中的相应值替换。

因此,我假设未选中不属于other视图的前5列-否则只有z将被(object1, id)替换,因为它是唯一具有值的列超出范围。为什么在这些列中填充NaN?还有其他方法可以让我立即检查NaN的选定部分吗?

1 个答案:

答案 0 :(得分:2)

我认为您需要替换已过滤的DataFrame z,因为布尔掩码来自已过滤的DataFrame

print (z.mask((z < 0) | (z > 100)))
     object1                object                
        var1       var2       var1       var2 var3
0  34.405150  40.470001        NaN  34.000000  0.0
1  25.047388  40.369999  82.842171  33.700001  0.0
2  94.122835  40.277000        NaN  33.900002  0.0
3        NaN  40.188999  79.834408  34.000000  0.0
4  60.152592  40.102001  87.395262  34.000000  0.0

如果要分配输出,请分配回已过滤的DataFrame:

idx = pd.IndexSlice
z = example_df.loc[:, idx[:, ['var1', 'var2', 'var3']]]

example_df.loc[:, idx[:, ['var1', 'var2', 'var3']]] = z.mask((z < 0) | (z > 100))
print (example_df)
  meta_info                        object1                                \
       time counter measurement_id     grp      id       var1       var2   
0      2100     0.0              1       0  376690  34.405150  40.470001   
1      2200     1.0              1       0  376690  25.047388  40.369999   
2      2300     2.0              1       0  376690  94.122835  40.277000   
3      2400     3.0              1       0  376690        NaN  40.188999   
4       100     4.0              1       0  376690  60.152592  40.102001   

      object                  
        var1       var2 var3  
0        NaN  34.000000  0.0  
1  82.842171  33.700001  0.0  
2        NaN  33.900002  0.0  
3  79.834408  34.000000  0.0  
4  87.395262  34.000000  0.0