大熊猫中的布尔子集

时间:2019-06-20 11:59:05

标签: python-3.x pandas

在学习“大熊猫”之后的熊猫时,遇到了这样的例子

#+BEGIN_SRC  python :results output  :session
print(scientists)
#+END_SRC

#+RESULTS:
                   Name                               Born        Died        Age          Occupation
0     Rosaline Franklin     1920-07-25  1958-04-16   37             Chemist
1        William Gosset        1876-06-13  1937-10-16   61        Statistician
2  Florence Nightingale  1820-05-12  1910-08-13   90               Nurse
3           Marie Curie            1867-11-07  1934-07-04   66             Chemist
4         Rachel Carson        1907-05-27  1964-04-14   56           Biologist
5             John Snow           1813-03-15  1858-06-16   45           Physician
6           Alan Turing            1912-06-23  1954-06-07   41  Computer Scientist
7          Johann Gauss         1777-04-30  1855-02-23   77       Mathematician

布尔操作

#+BEGIN_SRC  python :results output  :session
# boolean vectors will subset rows
print(scientists[scientists['Age'] > scientists['Age'].mean()])
#+END_SRC

#+RESULTS:
:                    Name        Born        Died  Age     Occupation
: 1        William Gosset        1876-06-13  1937-10-16   61   Statistician
: 2  Florence Nightingale  1820-05-12  1910-08-13   90          Nurse
: 3           Marie Curie            1867-11-07  1934-07-04   66        Chemist
: 7          Johann Gauss       1777-04-30  1855-02-23   77  Mathematician

然后带有一个混乱的操作,它指出:

  

由于广播的工作原理,如果我们提供的布尔矢量不是   与数据框中的行数相同,最大行数   返回的将是布尔向量的长度。

#+BEGIN_SRC  python :results output  :session
# 4 values passed as a bool vector
# 3 rows returned
print(scientists.loc[[True, True, False, True]])
#+END_SRC

#+RESULTS:
:     Name                            Born                        Died  Age    Occupation
: 0  Rosaline Franklin  1920-07-25  1958-04-16   37       Chemist
: 1     William Gosset    1876-06-13  1937-10-16   61  Statistician
: 3        Marie Curie        1867-11-07  1934-07-04   66       Chemist

结果使我感到困惑,[[True, True, False, True]])映射到什么?

1 个答案:

答案 0 :(得分:1)

这意味着您通过boolean indexing传递布尔掩码-行被布尔系列,列表或数组过滤-仅返回具有True的行-因此在索引为0,1,3的数据中。

在pandas 0.24+中测试后,如果行数更高(如布尔掩码中的值数),它将正常工作:

df1 = pd.DataFrame({'a': range(6)}) 
print (df1)
   a
0  0
1  1
2  2
3  3
4  4
5  5

print(df1.loc[[True, True, False, True]])
   a
0  0
1  1
3  3