我正在寻找一种方法来检查是否可以在另一个字符串中找到一个字符串。 str.contains
仅采用固定的字符串模式作为参数,我希望在两个字符串列之间进行逐元素比较。
import pandas as pd
df = pd.DataFrame({'long': ['sometext', 'someothertext', 'evenmoretext'],
'short': ['some', 'other', 'stuff']})
# This fails:
df['short_in_long'] = df['long'].str.contains(df['short'])
预期输出:
[True, True, False]
答案 0 :(得分:3)
对zip
使用列表理解:
df['short_in_long'] = [b in a for a, b in zip(df['long'], df['short'])]
print (df)
long short short_in_long
0 sometext some True
1 someothertext other True
2 evenmoretext stuff False
答案 1 :(得分:3)
这是列表理解的主要用例:
# df['short_in_long'] = [y in x for x, y in df[['long', 'short']].values.tolist()]
df['short_in_long'] = [y in x for x, y in df[['long', 'short']].values]
df
long short short_in_long
0 sometext some True
1 someothertext other True
2 evenmoretext stuff False
由于较少的开销,列表推导通常比字符串方法快。参见For loops with pandas - When should I care?。
如果数据包含NaN,则可以调用带有错误处理的函数:
def try_check(haystack, needle):
try:
return needle in haystack
except TypeError:
return False
df['short_in_long'] = [try_check(x, y) for x, y in df[['long', 'short']].values]
答案 2 :(得分:3)
用numpy
检查,它是逐行的:-)。
np.core.char.find(df.long.values.astype(str),df.short.values.astype(str))!=-1
Out[302]: array([ True, True, False])
答案 3 :(得分:1)
还
df['short_in_long'] = df['long'].str.contains('|'.join(df['short'].values))
更新: 我误解了这个问题。这是更正的版本:
df['short_in_long'] = df['long'].apply(lambda x: True if x[1] in x[0] else False, axis =1)