从超出范围的pandas数组中删除数据

时间:2016-01-18 12:40:37

标签: python numpy pandas

我有一个像这样的pandas数组:

       x        y               z
  35.013930 048.775597        0.22 
  42.015619 368.803652        0.00 
  03.017302 349.831709        1.20
  05.018978 378.859767        2.20 
  07.020646 300.887827        0.05
  23.022307 044.915887        0.23
      .           .             . 
      .           .             . 
      .           .             .

有大约40,000行。

我需要删除数据(x, y)不在y:(44,350.5)x:(4.5,35.8)范围内的行。

因此,输出将是这样的:

      x        y               z
  35.013930 048.775597        0.22  
  07.020646 300.887827        0.05
  23.022307 044.915887        0.23                    
      .           .             .
      .           .             .    

我认为将np.where(np.logical_and())x, y列一起使用可能是一种解决方案,但我不知道该怎么做。有谁知道解决方案?

1 个答案:

答案 0 :(得分:1)

您可以使用locquery。我尝试使用conditions获取推荐输出,然后使用问题文本conditions

print df

#           x           y     z
#0  35.013930   48.775597  0.22
#1  42.015619  368.803652  0.00
#2   3.017302  349.831709  1.20
#3   5.018978  378.859767  2.20
#4   7.020646  300.887827  0.05
#5  23.022307   44.915887  0.23
print df.loc[(df.y > 44) & (df.y < 350.5) & (df.x > 4.5) & (df.x < 35.8)]

#           x           y     z
#0  35.013930   48.775597  0.22
#4   7.020646  300.887827  0.05
#5  23.022307   44.915887  0.23

print df.query('y > 44 and y < 350.5 and x > 4.5 and x < 35.8')

#           x           y     z
#0  35.013930   48.775597  0.22
#4   7.020646  300.887827  0.05
#5  23.022307   44.915887  0.23
    
print df.loc[~((df.y > 44) & (df.y < 350.5) & (df.x > 4.5) & (df.x < 35.8))]

#           x           y    z
#1  42.015619  368.803652  0.0
#2   3.017302  349.831709  1.2
#3   5.018978  378.859767  2.2

print df.query(' not (y > 44 and y < 350.5 and x > 4.5 and x < 35.8)')

#           x           y    z
#1  42.015619  368.803652  0.0
#2   3.017302  349.831709  1.2
#3   5.018978  378.859767  2.2

reset_index

print df

#           x           y     z
#0  35.013930   48.775597  0.22
#1  42.015619  368.803652  0.00
#2   3.017302  349.831709  1.20
#3   5.018978  378.859767  2.20
#4   7.020646  300.887827  0.05
#5  23.022307   44.915887  0.23
print df.loc[(df.y > 44) & (df.y < 350.5) & (df.x > 4.5) & (df.x < 35.8)]
        .reset_index(drop=True)

#           x           y     z
#0  35.013930   48.775597  0.22
#1   7.020646  300.887827  0.05
#2  23.022307   44.915887  0.23

print df.query('y > 44 and y < 350.5 and x > 4.5 and x < 35.8')
        .reset_index(drop=True)

#           x           y     z
#0  35.013930   48.775597  0.22
#1   7.020646  300.887827  0.05
#2  23.022307   44.915887  0.23
    
print df.loc[~((df.y > 44) & (df.y < 350.5) & (df.x > 4.5) & (df.x < 35.8))]
        .reset_index(drop=True)

#           x           y    z
#0  42.015619  368.803652  0.0
#1   3.017302  349.831709  1.2
#2   5.018978  378.859767  2.2


print df.query(' not (y > 44 and y < 350.5 and x > 4.5 and x < 35.8)')
        .reset_index(drop=True)

#           x           y    z
#0  42.015619  368.803652  0.0
#1   3.017302  349.831709  1.2
#2   5.018978  378.859767  2.2