我有以下数据框my_df
:
col_A col_B
---------------
John []
Mary ['A','B','C']
Ann ['B','C']
我想删除col_B
有空列表的行。即我希望新数据框为:
col_A col_B
---------------
Mary ['A','B','C']
Ann ['B','C']
以下是我的所作所为:
my_df[ len(my_df['col_B']) >0 ]
但是我收到了以下错误:
KeyError Traceback (most recent call last)
/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
2133 try:
-> 2134 return self._engine.get_loc(key)
2135 except KeyError:
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()
KeyError: True
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-27-75da0b0af6a1> in <module>()
----> 1 records_df_pair_count[ len(records_df_pair_count['stable_seq']) >0 ]
/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in __getitem__(self, key)
2057 return self._getitem_multilevel(key)
2058 else:
-> 2059 return self._getitem_column(key)
2060
2061 def _getitem_column(self, key):
/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in _getitem_column(self, key)
2064 # get column
2065 if self.columns.is_unique:
-> 2066 return self._get_item_cache(key)
2067
2068 # duplicate columns & possible reduce dimensionality
/usr/local/lib/python3.4/dist-packages/pandas/core/generic.py in _get_item_cache(self, item)
1384 res = cache.get(item)
1385 if res is None:
-> 1386 values = self._data.get(item)
1387 res = self._box_item_values(item, values)
1388 cache[item] = res
/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in get(self, item, fastpath)
3539
3540 if not isnull(item):
-> 3541 loc = self.items.get_loc(item)
3542 else:
3543 indexer = np.arange(len(self.items))[isnull(self.items)]
/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
2134 return self._engine.get_loc(key)
2135 except KeyError:
-> 2136 return self._engine.get_loc(self._maybe_cast_indexer(key))
2137
2138 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()
KeyError: True
知道我在这里做错了吗?谢谢!
答案 0 :(得分:3)
您可以使用Series.str.len()方法:
my_df[my_df['col_B'].str.len() > 0]
答案 1 :(得分:2)
另一种方法:
my_df[my_df['col_b'].apply(lambda x: len(x)) > 0]
答案 2 :(得分:1)
您已经有几个答案可以解决问题。但我想我会解释为什么你的工作不起作用。
这给了一个熊猫系列:
my_df['col_B']
所以这给出了系列的长度:
len(my_df['col_B'])
由于您有非空系列,因此评估为True:
len(my_df['col_B']) >0
而且:
my_df[ len(my_df['col_B']) >0 ]
评估为:
my_df[True]
显然my_df不会将True作为列索引。因此KeyError。