Question

我有一个子字符串列表（=变量'searchfor'），希望在DF列（单元格值）中进行搜索和替换。

我要搜索并替换所有出现的内容。

示例：假设我的DF像元值是： “很棒的触摸屏，握起来很舒服”

我的“搜索”列表为= [“电池寿命很长”，“握起来很舒服”]

所以我想要这样的输出：行中凡存在任何这些字符串的地方，（或逻辑）那些DF行都应该出现。

我可以通过像itertuples这样的单行迭代器在python中进行操作，然后在每个行实例内部，获取各自的单元格值，并使用if else和regex表达式。

但是我们可以不使用行迭代器来做到这一点吗？（就像直接使用带有正则表达式的列表推导一样？）

我将“搜索”列表转换为较低的值以进行正确匹配

searchfor=[x.lower() for x in searchfor]

我从下面的链接中找到了这个'|'.join（search）： Python match the string of one column to the substring of another column 而且我认为这使我更接近完成这项工作（因此，如果我们能保留这个，那将是很好的！）

      #The DF which I am using for this is as below
DF= pd.DataFrame(
    {'ID':[0,1,2],
     'ReviewText':
         [ '  Battery life is great and '
          ,' works well for the need i have for it! '
          ,' Great touchscreen and  nice to hold']
     },
    index=[0,1,2])
#   ID                                ReviewText
#0   0                Battery life is great and 
#1   1   works well for the need i have for it! 
#2   2       Great touchscreen and  nice to hold

# Notice the ending spaces in reviewtext column elements.

# the searchfor list is as below
searchfor = [' Battery life is great ' ,' nice to hold ' ]

#finally I tried something like this :

DF.loc[DF.ReviewText.str.lower().str.contains('|'.join(searchfor)), :]

预期的答案是：第一和第三应该匹配并且应该出现在结果中。

尝试1：当“搜索”列表的“保留”字末尾有空格时，输出为空DF（不匹配任何内容）

searchfor = ['Battery life is great ' ,' nice to hold ' ]
DF.loc[DF.ReviewText.str.lower().str.contains('|'.join(searchfor)), :]
#Empty DataFrame
#Columns: [ID, ReviewText]
#Index: []

尝试2：当'searchfor'的'hold'词没有结尾空间时，输出为：出现行，但'battery life ..'行仍然不匹配。

我认为，不知何故应该在其中添加一个正则表达式，以解决这些结束/开始以及空格之间的问题。

请帮助！

在DF列中使用OR搜索多个子字符串

0 个答案: