df.ix将NAN作为子集

时间:2017-07-18 21:33:28

标签: python-2.7 pandas subset

我有一个如下[72行x 25列]的数据框:

     Pin      CPULabel   Freq(MHz) DCycle     Skew(1-3)min Skew(1-3)mean
0    Dif0    BP100_Fast   99.9843  0.492            0             0   
1    Dif0    BP100_Slow   100.011  0.493            0             0   
2    Dif0  100HiBW_Fast   100.006  0.503            0             0   
3    Dif0  100HiBW_Slow   100.007  0.504            0             0   
4    Dif0  100LoBW_Fast   100.005  0.503            0             0   
5    Dif0  100LoBW_Slow   99.9951  0.504            0             0   
8    Dif1    BP100_Fast   99.9928  0.492            7            10   
9    Dif1    BP100_Slow   99.9962  0.492           11            12   
10   Dif1  100HiBW_Fast   100.014  0.502           10            11   
11   Dif1  100HiBW_Slow   100.006  0.503            6            13   
12   Dif1  100LoBW_Fast   99.9965  0.502            5            10   
13   Dif1  100LoBW_Slow   99.9946  0.503           12            14   
16   Dif2    BP100_Fast   99.9929  0.493            2             6   
17   Dif2    BP100_Slow    99.997  0.493            8            13   
18   Dif2  100HiBW_Fast   100.002  0.504            4             9   
19   Dif2  100HiBW_Slow   99.9964  0.504           13            17   
20   Dif2  100LoBW_Fast   100.021  0.504            8             9   

我只对包含BP100_Fast,100HiBW和100HiBW字符串的行感兴趣。所以我使用了以下命令:

excel = pd.read_excel('25C_3.3V.xlsx', skiprows=1)
excel.fillna(value=0, inplace=True)
general = excel[excel['Pin'] != 'Clkin']
general.drop_duplicates(keep=False, inplace=True)
slew = general[(general['CPULabel']=='BP100_Fast') | (general['CPULabel']=='100LoBW_Fast') | (general['CPULabel']=='100HiBW_Fast')]

我能够得到我想要的[36行x 25列]:

      Pin     CPULabel   Freq(MHz) DCycle      Skew(1-3)min Skew(1-3)mean  
0    Dif0    BP100_Fast   99.9843  0.492            0             0   
2    Dif0  100HiBW_Fast   100.006  0.503            0             0   
4    Dif0  100LoBW_Fast   100.005  0.503            0             0   
8    Dif1    BP100_Fast   99.9928  0.492            7            10   
10   Dif1  100HiBW_Fast   100.014  0.502           10            11   
12   Dif1  100LoBW_Fast   99.9965  0.502            5            10   
16   Dif2    BP100_Fast   99.9929  0.493            2             6   
18   Dif2  100HiBW_Fast   100.002  0.504            4             9   
20   Dif2  100LoBW_Fast   100.021  0.504            8             9   

但是,如果我更改了最后一个命令:

slew = general.ix[['BP100_Fast', '100LoBW_Fast', '100HiBW_Fast'], :]

我的结果是NAN。 [3行x 25列]

              Pin    CPULabel  Freq(MHz) DCycle Skew(1-3)min Skew(1-3)mean
BP100_Fast    NaN      NaN       NaN      NaN        NaN          NaN   
100LoBW_Fast  NaN      NaN       NaN      NaN        NaN          NaN   
100HiBW_Fast  NaN      NaN       NaN      NaN        NaN          NaN   

有没有办法用df.ix来完成这个?非常感谢你。

2 个答案:

答案 0 :(得分:2)

尝试这种方法:

labels = ['BP100_Fast', '100HiBW', '100HiBW']

slew = \
pd.read_excel('25C_3.3V.xlsx', skiprows=1) \
  .fillna(value=0) \
  .query("Pin != Clkin and CPULabel in @labels") \
  .drop_duplicates(keep=False)

或者你可以改变:

slew = general.ix[['BP100_Fast', '100LoBW_Fast', '100HiBW_Fast'], :]

为:

slew = general.loc[general['CPULabel'].isin(['BP100_Fast','100LoBW_Fast','100HiBW_Fast'])]

答案 1 :(得分:2)

Per Docs

  

.ix索引器已弃用,支持更严格的.iloc和.loc索引器。 .ix在推断用户想要做的事情上提供了很多魔力。也就是说,.ix可以决定根据索引的数据类型在位置上或通过标签进行索引。多年来,这引起了相当多的用户混淆。完整的索引文档在这里。 (GH14218)

选项1
isin

general[general.CPULabel.isin(['BP100_Fast', '100LoBW_Fast', '100HiBW_Fast'])]

     Pin      CPULabel  Freq(MHz)  DCycle  Skew(1-3)min  Skew(1-3)mean
0   Dif0    BP100_Fast    99.9843   0.492             0              0
2   Dif0  100HiBW_Fast   100.0060   0.503             0              0
4   Dif0  100LoBW_Fast   100.0050   0.503             0              0
8   Dif1    BP100_Fast    99.9928   0.492             7             10
10  Dif1  100HiBW_Fast   100.0140   0.502            10             11
12  Dif1  100LoBW_Fast    99.9965   0.502             5             10
16  Dif2    BP100_Fast    99.9929   0.493             2              6
18  Dif2  100HiBW_Fast   100.0020   0.504             4              9
20  Dif2  100LoBW_Fast   100.0210   0.504             8              9

选项2
query

general.query('CPULabel in ["BP100_Fast", "100LoBW_Fast", "100HiBW_Fast"]')

     Pin      CPULabel  Freq(MHz)  DCycle  Skew(1-3)min  Skew(1-3)mean
0   Dif0    BP100_Fast    99.9843   0.492             0              0
2   Dif0  100HiBW_Fast   100.0060   0.503             0              0
4   Dif0  100LoBW_Fast   100.0050   0.503             0              0
8   Dif1    BP100_Fast    99.9928   0.492             7             10
10  Dif1  100HiBW_Fast   100.0140   0.502            10             11
12  Dif1  100LoBW_Fast    99.9965   0.502             5             10
16  Dif2    BP100_Fast    99.9929   0.493             2              6
18  Dif2  100HiBW_Fast   100.0020   0.504             4              9
20  Dif2  100LoBW_Fast   100.0210   0.504             8              9

选项3
pd.Series.str.endswith

 general[general.CPULabel.str.endswith('Fast')]

     Pin      CPULabel  Freq(MHz)  DCycle  Skew(1-3)min  Skew(1-3)mean
0   Dif0    BP100_Fast    99.9843   0.492             0              0
2   Dif0  100HiBW_Fast   100.0060   0.503             0              0
4   Dif0  100LoBW_Fast   100.0050   0.503             0              0
8   Dif1    BP100_Fast    99.9928   0.492             7             10
10  Dif1  100HiBW_Fast   100.0140   0.502            10             11
12  Dif1  100LoBW_Fast    99.9965   0.502             5             10
16  Dif2    BP100_Fast    99.9929   0.493             2              6
18  Dif2  100HiBW_Fast   100.0020   0.504             4              9
20  Dif2  100LoBW_Fast   100.0210   0.504             8              9