在Python数据集中按条件过滤

时间:2017-01-11 15:39:26

标签: python python-3.x pandas dataset

我正在努力解决Phyton3中stata文件的排序操作:我被要求只保留没有孩子的家庭数据集/表:

我使用过滤条件从表格中过滤掉这些行:

filtering_condition = df["kids"] > 0

df_nokids = df.loc[filtering_condition,"kids"]

然而,这给了我一个未知的错误:

KeyError                                  Traceback (most recent call last)
/opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
1944             try:
-> 1945                 return self._engine.get_loc(key)
   1946             except KeyError:

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item     (pandas/hashtable.c:12322)()

KeyError: 'kids'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-321-e72cd8a67065> in <module>()
      1 #keep only the households without kids and use this dataset for the   rest of the assignment
----> 2 filtering_condition = df["kids"] > 0
      3 df_nokids = df.loc[filtering_condition,"kids"]

/opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in  __getitem__(self, key)
   1995             return self._getitem_multilevel(key)
   1996         else:
-> 1997             return self._getitem_column(key)
   1998 
   1999     def _getitem_column(self, key):

/opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py in  _getitem_column(self, key)
   2002         # get column
   2003         if self.columns.is_unique:
-> 2004             return self._get_item_cache(key)
   2005 
   2006         # duplicate columns & possible reduce dimensionality

/opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/core/generic.py    in _get_item_cache(self, item)
   1348         res = cache.get(item)
   1349         if res is None:
-> 1350             values = self._data.get(item)
   1351             res = self._box_item_values(item, values)
   1352             cache[item] = res

/opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/core/internals.py     in get(self, item, fastpath)
   3288 
   3289             if not isnull(item):
-> 3290                 loc = self.items.get_loc(item)
   3291             else:
   3292                 indexer = np.arange(len(self.items))   [isnull(self.items)]

 /opt/anaconda/anaconda3/lib/python3.5/site-packages/pandas/indexes/base.py    in get_loc(self, key, method, tolerance)
   1945                 return self._engine.get_loc(key)
   1946             except KeyError:
-> 1947                 return     self._engine.get_loc(self._maybe_cast_indexer(key))
   1948 
   1949         indexer = self.get_indexer([key], method=method,    tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4154)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4018)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322)()

KeyError: 'kids'

对我做错了什么的解释?

谢谢!

数据文件: enter image description here

1 个答案:

答案 0 :(得分:1)

你的意思是这样的:

df_kids = df[df['kids']>0]

这会选择&#39; kids&#39;列不为零。