我正在尝试使用dataframe1迭代字符串列表,以检查其他dataframe2是否在dataframe1中找到任何字符串来替换它们。
for index, row in nlp_df.iterrows():
print( row['x1'] )
string1 = row['x1'].replace("(","\(")
string1 = string1.replace(")","\)")
string1 = string1.replace("[","\[")
string1 = string1.replace("]","\]")
nlp2_df['title'] = nlp2_df['title'].replace(string1,"")
为了做到这一点,我使用上面显示的代码进行迭代检查并替换df1中找到的任何字符串
下面的输出显示df1
中的字符串wait_timeout
interactive_timeout
pool_recycle
....
__all__
folder_name
re.compile('he(lo')
下面的输出显示在替换df2
中的字符串后的输出0 have you tried watching the traffic between th...
1 /dev/cu.xxxxx is the "callout" device, it's wh...
2 You'll want the struct package.\r\r\n
对于df2中的输出,如/dev/cu.xxxxx
之类的字符串应该在迭代期间被替换,但如图所示,它不会被删除。但是,我尝试使用nlp2_df['title'] = nlp2_df['title'].replace("/dev/cu.xxxxx","")
并设法成功删除它是否有一个原因,为什么直接写字符串工作但循环使用变量用于替换唐?
先谢谢!
答案 0 :(得分:0)
IIUC你可以简单地使用正则表达式:
nlp2_df['title'] = nlp2_df['title'].str.replace(r'([\(\)\[\]])',r'\\\1')
PS你根本不需要for loop
......
演示:
In [15]: df
Out[15]:
title
0 aaa (bbb) ccc
1 A [word] ...
In [16]: df['new'] = df['title'].str.replace(r'([\(\)\[\]])',r'\\\1')
In [17]: df
Out[17]:
title new
0 aaa (bbb) ccc aaa \(bbb\) ccc
1 A [word] ... A \[word\] ...