至少有4种方法可以检索pandas系列中的元素:.iloc,.loc .ix并直接使用[]运算符。
他们之间有什么区别?他们如何处理丢失的标签/超出范围的位置?
答案 0 :(得分:0)
一般的想法是,虽然.iloc和.loc分别保证按位置和索引(标签)执行查找,但它们比使用.ix或直接使用[]运算符慢一点。这两个以前的方法通过索引或位置执行查找,具体取决于要查找的Series中的索引类型以及应查找的数据。
然而,使用.iloc和.loc,as described in this page时也存在一些不一致。
下表总结了这4种查找方法的行为,这取决于(a)如果要查找的Series有一个整数或一个字符串索引(我暂时不考虑日期索引),(b)如果所需的数据是单个元素,切片索引或列表(是的,行为改变!)和(c)是否在数据中找到索引。
以下示例适用于pandas 0.17.1,NumPy 1.10.4,Python 3.4.3。
s = pd.Series(np.arange(100,105), index=np.arange(10,15))
s
10 100
11 101
12 102
13 103
14 104
** Single element ** ** Slice ** ** Tuple **
s[0] -> LAB -> KeyError s[0:2] -> POS -> {10:100, 11:101} s[[1,3]] -> LAB -> {1:NaN, 3:Nan}
s[13] -> LAB -> 103 s[10:12] -> POS -> empty Series s[[12,14]] -> LAB -> {12:102, 14:104}
--- --- ---
s.ix[0] -> LAB -> KeyError s.ix[0:2] -> LAB -> empty Series s.ix[[1,3]] -> LAB -> {1:NaN, 3:Nan}
s.ix[13] -> LAB -> 103 s.ix[10:12] -> LAB -> {10:100, 11:101, 12:102} s.ix[[12,14]] -> LAB -> {12:102, 14:104}
--- --- ---
s.iloc[0] -> POS -> 100 s.iloc[0:2] -> POS -> {10:100, 11:101} s.iloc[[1,3]] -> POS -> {11:101, 13:103}
s.iloc[13] -> POS -> IndexError s.iloc[10:12] -> POS -> empty Series s.iloc[[12,14]] -> POS -> IndexError
--- --- ---
s.loc[0] -> LAB -> KeyError s.loc[0:2] -> LAB -> empty Series s.loc[[1,3]] -> LAB -> KeyError
s.loc[13] -> LAB -> 103 s.loc[10:12] -> LAB -> {10:100, 11:101, 12:102} s.loc[[12,14]] -> LAB -> {12:102, 14:104}
s = pd.Series(np.arange(100,105), index=['a','b','c','d','e'])
s
a 100
b 101
c 102
d 103
e 104
** Single element ** ** Slice ** ** Tuple **
s[0] -> POS -> 100 s[0:2] -> POS -> {'a':100,'b':101} s[[0,2]] -> POS -> {'a':100,'c':102}
s[10] -> LAB, POS -> KeyError, IndexError s[10:12] -> POS -> Empty Series s[[10,12]] -> POS -> IndexError
s['a'] -> LAB -> 100 s['a':'c'] -> LAB -> {'a':100,'b':101, 'c':102} s[['a','c']] -> LAB -> {'a':100,'b':101, 'c':102}
s['g'] -> POS,LAB -> TypeError, KeyError s['f':'h'] -> LAB -> Empty Series s[['f','h']] -> LAB -> {'f':NaN, 'h':NaN}
--- --- ---
s.ix[0] -> POS -> 100 s.ix[0:2] -> POS -> {'a':100,'b':101} s.ix[[0,2]] -> POS -> {'a':100,'c':102}
s.ix[10] -> POS -> IndexError s.ix[10:12] -> POS -> Empty Series s.ix[[10,12]] -> POS -> IndexError
s.ix['a'] -> LAB -> 100 s.ix['a':'c'] -> LAB -> {'a':100,'b':101, 'c':102} s.ix[['a','c']] -> LAB -> {'a':100,'b':101, 'c':102}
s.ix['g'] -> POS, LAB -> TypeError, KeyError s.ix['f':'h'] -> LAB -> Empty Series s.ix[['f','h']] -> LAB -> {'f':NaN, 'h':NaN}
--- --- ---
s.iloc[0] -> POS -> 100 s.iloc[0:2] -> POS -> {'a':100,'b':101} s.iloc[[0,2]] -> POS -> {'a':100,'c':102}
s.iloc[10] -> POS -> IndexError s.iloc[10:12] -> POS -> Empty Series s.iloc[[10,12]] -> POS -> IndexError
s.iloc['a'] -> LAB -> TypeError s.iloc['a':'c'] -> POS -> ValueError s.iloc[['a','c']] -> POS -> TypeError
s.iloc['g'] -> LAB -> TypeError s.iloc['f':'h'] -> POS -> ValueError s.iloc[['f','h']] -> POS -> TypeError
--- --- ---
s.loc[0] -> LAB -> KeyError s.loc[0:2] -> LAB -> TypeError s.loc[[0,2]] -> LAB -> KeyError
s.loc[10] -> LAB -> KeyError s.loc[10:12] -> LAB -> TypeError s.loc[[10,12]] -> LAB -> KeyError
s.loc['a'] -> LAB-> 100 s.loc['a':'c'] -> LAB -> {'a':100,'b':101, 'c':102} s.loc[['a','c']] -> LAB -> {'a':100,'c':102}
s.loc['g'] -> LAB -> KeyError s.loc['f':'h'] -> LAB -> Empty Series s.loc[['f','h']] -> LAB -> KeyError
请注意,有三种方法可以处理未找到的标签/超出范围的位置:抛出异常,返回null系列或返回带有与NaN
值关联的所需键的系列。
另请注意,在按位置使用切片查询时,会排除结束元素,但在按标签查询时,会包含结尾元素。