在熊猫列中搜索字符串

时间:2021-04-09 16:38:58

标签: python pandas string dataframe data-wrangling

我试图在 hard_skills_name 列下面找到一个子字符串,就像我想要所有具有“Apple Products”作为硬技能的行一样。

enter image description here

我尝试了以下代码:

df.loc[df['hard_skills_name'].str.contains("Apple Products", case=False)]

但收到此错误:

KeyError                                  Traceback (most recent call last)
<ipython-input-49-acdcdfbdfd3d> in <module>
----> 1 df.loc[df['hard_skills_name'].str.contains("Apple Products", case=False)]

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)
    877 
    878             maybe_callable = com.apply_if_callable(key, self.obj)
--> 879             return self._getitem_axis(maybe_callable, axis=axis)
    880 
    881     def _is_scalar_access(self, key: Tuple):

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1097                     raise ValueError("Cannot index with multidimensional key")
   1098 
-> 1099                 return self._getitem_iterable(key, axis=axis)
   1100 
   1101             # nested tuple slicing

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1035 
   1036         # A collection of keys
-> 1037         keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
   1038         return self.obj._reindex_with_indexers(
   1039             {axis: [keyarr, indexer]}, copy=True, allow_dups=True

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1252             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1253 
-> 1254         self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
   1255         return keyarr, indexer
   1256 

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1296             if missing == len(indexer):
   1297                 axis_name = self.obj._get_axis_name(axis)
-> 1298                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1299 
   1300             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Float64Index([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan],\n             dtype='float64')] are in the [index]"

2 个答案:

答案 0 :(得分:2)

尝试在字符串搜索之前通过 str.join() 将字符串列表(临时)链接转换为逗号分隔的字符串:

df[df['hard_skills_name'].str.join(', ').str.contains("Apple Products", case=False)]

问题是由于您要搜索的字符串包含在列表中。您不能直接使用 .str.contains() 搜索列表中的字符串。要解决这个问题,您可以在进行字符串搜索之前先通过 .str.join() 将字符串列表转换为长字符串(例如用逗号分隔子字符串)。

答案 1 :(得分:1)

您的索引具有空值。你将不得不为此制作一个布尔掩码。直接回答您的问题:

df.loc[(df.index.notnull()) & (df['hard_skills_name'].str.contains("Apple Products", case=False))] 

这应该排除任何具有空索引值并且在 hard_skills_name 中包含给定字符串的内容

但是,我怀疑这也会排除您正在寻找的一些数据。在这种情况下,解决方案是将索引更改为没有任何 NaN。这意味着用占位符值替换它还是创建一个全新的索引,这取决于您。