Question

我正在寻找一种方法来检查是否可以在另一个字符串中找到一个字符串。 str.contains仅采用固定的字符串模式作为参数，我希望在两个字符串列之间进行逐元素比较。

import pandas as pd

df = pd.DataFrame({'long': ['sometext', 'someothertext', 'evenmoretext'],
               'short': ['some', 'other', 'stuff']})


# This fails:
df['short_in_long'] = df['long'].str.contains(df['short'])

预期输出：

[True, True, False]

Answer 1

对zip使用列表理解：

df['short_in_long'] = [b in a for a, b in zip(df['long'], df['short'])]

print (df)
            long  short  short_in_long
0       sometext   some           True
1  someothertext  other           True
2   evenmoretext  stuff          False

Answer 2

这是列表理解的主要用例：

# df['short_in_long'] = [y in x for x, y in df[['long', 'short']].values.tolist()]
df['short_in_long'] = [y in x for x, y in df[['long', 'short']].values]
df

            long  short  short_in_long
0       sometext   some           True
1  someothertext  other           True
2   evenmoretext  stuff          False

由于较少的开销，列表推导通常比字符串方法快。参见For loops with pandas - When should I care?。

如果数据包含NaN，则可以调用带有错误处理的函数：

def try_check(haystack, needle):
    try:
        return needle in haystack
    except TypeError:
        return False

df['short_in_long'] = [try_check(x, y) for x, y in df[['long', 'short']].values]

Answer 3

用numpy检查，它是逐行的：-）。

np.core.char.find(df.long.values.astype(str),df.short.values.astype(str))!=-1
Out[302]: array([ True,  True, False])

Answer 4

还

df['short_in_long'] = df['long'].str.contains('|'.join(df['short'].values))

更新： 我误解了这个问题。这是更正的版本：

df['short_in_long'] = df['long'].apply(lambda x: True if x[1] in x[0] else False, axis =1)

检查元素是否存在字符串

4 个答案: