Question

至少有4种方法可以检索pandas系列中的元素：.iloc，.loc .ix并直接使用[]运算符。

他们之间有什么区别？他们如何处理丢失的标签/超出范围的位置？

Answer 1

一般的想法是，虽然.iloc和.loc分别保证按位置和索引（标签）执行查找，但它们比使用.ix或直接使用[]运算符慢一点。这两个以前的方法通过索引或位置执行查找，具体取决于要查找的Series中的索引类型以及应查找的数据。

然而，使用.iloc和.loc，as described in this page时也存在一些不一致。

下表总结了这4种查找方法的行为，这取决于（a）如果要查找的Series有一个整数或一个字符串索引（我暂时不考虑日期索引），（b）如果所需的数据是单个元素，切片索引或列表（是的，行为改变！）和（c）是否在数据中找到索引。

以下示例适用于pandas 0.17.1，NumPy 1.10.4，Python 3.4.3。

案例1：整数指数

s = pd.Series(np.arange(100,105), index=np.arange(10,15))
s
10    100
11    101
12    102
13    103
14    104

** Single element **             ** Slice **                                       ** Tuple **
s[0]       -> LAB -> KeyError    s[0:2]        -> POS -> {10:100, 11:101}          s[[1,3]]        -> LAB -> {1:NaN, 3:Nan}
s[13]      -> LAB -> 103         s[10:12]      -> POS -> empty Series              s[[12,14]]      -> LAB -> {12:102, 14:104}
---                              ---                                               ---
s.ix[0]    -> LAB -> KeyError    s.ix[0:2]     -> LAB -> empty Series              s.ix[[1,3]]     -> LAB -> {1:NaN, 3:Nan}
s.ix[13]   -> LAB -> 103         s.ix[10:12]   -> LAB -> {10:100, 11:101, 12:102}  s.ix[[12,14]]   -> LAB -> {12:102, 14:104}
---                              ---                                               ---
s.iloc[0]  -> POS -> 100         s.iloc[0:2]   -> POS -> {10:100, 11:101}          s.iloc[[1,3]]   -> POS -> {11:101, 13:103}
s.iloc[13] -> POS -> IndexError  s.iloc[10:12] -> POS -> empty Series              s.iloc[[12,14]] -> POS -> IndexError
---                              ---                                               ---
s.loc[0]   -> LAB -> KeyError    s.loc[0:2]    -> LAB -> empty Series              s.loc[[1,3]]    -> LAB -> KeyError
s.loc[13]  -> LAB -> 103         s.loc[10:12]  -> LAB -> {10:100, 11:101, 12:102}  s.loc[[12,14]]  -> LAB -> {12:102, 14:104}

案例2：带字符串索引的系列

s = pd.Series(np.arange(100,105), index=['a','b','c','d','e'])
s
a    100
b    101
c    102
d    103
e    104

** Single element **                             ** Slice **                                           ** Tuple **
s[0]        -> POS -> 100                        s[0:2]          -> POS -> {'a':100,'b':101}           s[[0,2]]          -> POS -> {'a':100,'c':102} 
s[10]       -> LAB, POS -> KeyError, IndexError  s[10:12]        -> POS -> Empty Series                s[[10,12]]        -> POS -> IndexError 
s['a']      -> LAB -> 100                        s['a':'c']      -> LAB -> {'a':100,'b':101, 'c':102}  s[['a','c']]      -> LAB -> {'a':100,'b':101, 'c':102} 
s['g']      -> POS,LAB -> TypeError, KeyError    s['f':'h']      -> LAB -> Empty Series                s[['f','h']]      -> LAB -> {'f':NaN, 'h':NaN}
---                                              ---                                                   ---
s.ix[0]     -> POS -> 100                        s.ix[0:2]       -> POS -> {'a':100,'b':101}           s.ix[[0,2]]       -> POS -> {'a':100,'c':102} 
s.ix[10]    -> POS -> IndexError                 s.ix[10:12]     -> POS -> Empty Series                s.ix[[10,12]]     -> POS -> IndexError 
s.ix['a']   -> LAB -> 100                        s.ix['a':'c']   -> LAB -> {'a':100,'b':101, 'c':102}  s.ix[['a','c']]   -> LAB -> {'a':100,'b':101, 'c':102} 
s.ix['g']   -> POS, LAB -> TypeError, KeyError   s.ix['f':'h']   -> LAB -> Empty Series                s.ix[['f','h']]   -> LAB -> {'f':NaN, 'h':NaN}
---                                              ---                                                   ---
s.iloc[0]   -> POS -> 100                        s.iloc[0:2]     -> POS -> {'a':100,'b':101}           s.iloc[[0,2]]     -> POS -> {'a':100,'c':102} 
s.iloc[10]  -> POS -> IndexError                 s.iloc[10:12]   -> POS -> Empty Series                s.iloc[[10,12]]   -> POS -> IndexError 
s.iloc['a'] -> LAB -> TypeError                  s.iloc['a':'c'] -> POS -> ValueError                  s.iloc[['a','c']] -> POS -> TypeError    
s.iloc['g'] -> LAB -> TypeError                  s.iloc['f':'h'] -> POS -> ValueError                  s.iloc[['f','h']] -> POS -> TypeError
---                                              ---                                                   ---
s.loc[0]    -> LAB -> KeyError                   s.loc[0:2]     -> LAB -> TypeError                   s.loc[[0,2]]     -> LAB -> KeyError 
s.loc[10]   -> LAB -> KeyError                   s.loc[10:12]   -> LAB -> TypeError                   s.loc[[10,12]]   -> LAB -> KeyError 
s.loc['a']  -> LAB-> 100                         s.loc['a':'c'] -> LAB -> {'a':100,'b':101, 'c':102}  s.loc[['a','c']] -> LAB -> {'a':100,'c':102}    
s.loc['g']  -> LAB -> KeyError                   s.loc['f':'h'] -> LAB -> Empty Series                s.loc[['f','h']] -> LAB -> KeyError

请注意，有三种方法可以处理未找到的标签/超出范围的位置：抛出异常，返回null系列或返回带有与NaN值关联的所需键的系列。

另请注意，在按位置使用切片查询时，会排除结束元素，但在按标签查询时，会包含结尾元素。

在pandas系列中检索元素的不同方法有哪些？

1 个答案:

案例1：整数指数

案例2：带字符串索引的系列