查询其中包含列表的列

时间:2018-11-14 12:54:02

标签: pandas nested

我有一个数据框,其中包含带有列表的列。我该如何查询呢?

>>> df2[df2.ALT.apply(lambda x: x == ['TC'])]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2682, in __getitem__
    return self._getitem_array(key)
  File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2726, in _getitem_array
    indexer = self.loc._convert_to_indexer(key, axis=1)
  File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/indexing.py", line 1314, in _convert_to_indexer
    indexer = check = labels.get_indexer(objarr)
  File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3259, in get_indexer
    indexer = self._engine.get_indexer(target._ndarray_values)
  File "pandas/_libs/index.pyx", line 301, in pandas._libs.index.IndexEngine.get_indexer
  File "pandas/_libs/hashtable_class_helper.pxi", line 1544, in pandas._libs.hashtable.PyObjectHashTable.lookup
TypeError: unhashable type: 'numpy.ndarray'

我找到了previous question,但是这些解决方案对我而言并不奏效。接受的解决方案不起作用:

>>> df2.ALT.apply(lambda x: x == ['TC']).head()
0    [False]
1    [False]
2    [False]
3     [True]
4    [False]
Name: ALT, dtype: object

原因是,布尔值被嵌套:

>>> c = np.empty(1, object)
>>> c[0] = ['TC']
>>> df2[df2.ALT.values == c]
  CHROM    POS           ID REF   ALT  QUAL  FILTER
3    20  60522  rs150241001   T  [TC]   100  [PASS]

所以我尝试了第二个答案,这似乎很有效:

>>> df3[df3.ALT.values == c]
Traceback (most recent call last):
  File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: False

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2688, in __getitem__
    return self._getitem_column(key)
  File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
    return self._get_item_cache(key)
  File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache
    values = self._data.get(item)
  File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/internals.py", line 4115, in get
    loc = self.items.get_loc(item)
  File "/home/user/miniconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: False

但是奇怪的是,当我在较大的数据框上尝试时它不起作用:

>>> df3.ALT.values == c
False
>>> df2.ALT.values == c
array([False, False, False,  True, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False])

这可能是因为布尔比较的结果不同!

then

这完全让我感到困惑。

1 个答案:

答案 0 :(得分:1)

我发现了一种元组为我工作时强制转换列表的解决方案

arr

该方法之所以有效,是因为元组的不变性使得大熊猫可以检查整个元素与布尔序列中列表内元素逐个比较是否正确,因为列表不可散列。