熊猫:选择数据时传递多个列

时间:2015-07-31 13:23:00

标签: python pandas

我可以pd_data = pd_data[pd_data['db_rating']>0]来过滤数据,选择db_rating > 0的记录。

现在我想要涉及其他列,例如,同时选择db_rating>0imdb_ratings_count>1000

但是 pd_data = pd_data[pd_data['db_rating']>0 and pd_data['imdb_ratings_count']>1000]给了我错误

ValueError                                Traceback (most recent call last)
<ipython-input-120-f83883d4bac8> in <module>()
      3 pd_data['imdb_rating'] = pd_data['imdb_rating'].astype(float)
      4 pd_data['imdb_ratings_count'] = pd_data['imdb_ratings_count'].astype(float)
----> 5 pd_data = pd_data[pd_data['db_rating']>0 and pd_data['imdb_ratings_count']>1000]
      6 pd_data.describe()

D:\Anaconda2\lib\site-packages\pandas\core\generic.pyc in __nonzero__(self)
    696         raise ValueError("The truth value of a {0} is ambiguous. "
    697                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 698                          .format(self.__class__.__name__))
    699 
    700     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我该怎么做?

2 个答案:

答案 0 :(得分:2)

Pandas为此覆盖了布尔&运算符。这应该有效:

pd_data = pd_data[(pd_data['db_rating']>0) & (pd_data['imdb_ratings_count']>1000)]

请参阅http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing

答案 1 :(得分:2)

在pandas中使用布尔向量时使用按位运算符:

pd_data = pd_data[(pd_data['db_rating']>0) & (pd_data['imdb_ratings_count']>1000)]