Question

我正在尝试使用dataframe1迭代字符串列表，以检查其他dataframe2是否在dataframe1中找到任何字符串来替换它们。

for index, row in nlp_df.iterrows():
    print( row['x1'] )
    string1 = row['x1'].replace("(","\(")
    string1 = string1.replace(")","\)")
    string1 = string1.replace("[","\[")
    string1 = string1.replace("]","\]")
    nlp2_df['title'] = nlp2_df['title'].replace(string1,"")

为了做到这一点，我使用上面显示的代码进行迭代检查并替换df1中找到的任何字符串

下面的输出显示df1

中的字符串

wait_timeout
interactive_timeout
pool_recycle
....
__all__
folder_name
re.compile('he(lo')

下面的输出显示在替换df2

中的字符串后的输出

0   have you tried watching the traffic between th...
1   /dev/cu.xxxxx is the "callout" device, it's wh...
2               You'll want the struct package.\r\r\n

对于df2中的输出，如/dev/cu.xxxxx之类的字符串应该在迭代期间被替换，但如图所示，它不会被删除。但是，我尝试使用nlp2_df['title'] = nlp2_df['title'].replace("/dev/cu.xxxxx","")并设法成功删除它是否有一个原因，为什么直接写字符串工作但循环使用变量用于替换唐？

先谢谢！

Answer 1

IIUC你可以简单地使用正则表达式：

nlp2_df['title'] = nlp2_df['title'].str.replace(r'([\(\)\[\]])',r'\\\1')

PS你根本不需要for loop ......

演示：

In [15]: df
Out[15]:
           title
0  aaa (bbb) ccc
1   A [word] ...

In [16]: df['new'] = df['title'].str.replace(r'([\(\)\[\]])',r'\\\1')

In [17]: df
Out[17]:
           title              new
0  aaa (bbb) ccc  aaa \(bbb\) ccc
1   A [word] ...   A \[word\] ...

pandas：替换字符串不会替换目标子字符串

1 个答案: