使用另一个字符串中的部分正则表达式和部分非正则表达式搜索文本进行搜索

时间:2016-05-21 06:58:51

标签: python regex python-3.x search

我有两个文件:

efile = c:\myexternal.txt    
cfile = c:\mycurrent.txt

myexternal.txt:

Paris
London
Amsterdam
New York

mycurrent.txt(但它可以是任何文字):

Paris is a city in France
A city in the UK is London
In the USA there is no city named Manchester
Amsterdam is in the Netherlands

我想要做的是外部文件(原始文本)中的每一行都在当前文件中进行搜索,但是使用正则表达式边界:

体育专业.:
我想在currentfile中找到来自externalfile的所有城市,但不想找到之前有“是”的城市,所有城市必须在城市名后面有空格或者必须在行尾:

boundO = "(?<!is\s)"
boundC = "(?=\s|$)"
#boundO + line in externalfile + boundC
#(regex rawtext regex)

#put every line of external file (c:\myexternal.txt) in list:
externalfile=[]
with open(efile, 'r+', encoding="utf8") as file:
  for line in file:
      if line.strip():                 #if line != empty
          line=line.rstrip("\n")       #remove linebreaks
          line=boundO + line + boundC  #add regex bounderies
          externalfile.append(line)

results = []
#check every line in c:\mycurrent.txt
with open(cfile, 'r+', encoding="utf8") as file:
  for line in file:
      if any(ext in line for ext in externalfile):
          results.append(line)

这不起作用:
边界不被视为正则表达式。

我错了什么?

3 个答案:

答案 0 :(得分:1)

您需要re.search。使用

with open("check.pl", 'r+') as file:
    for line in file:
        if any(re.search(ext, line) for ext in externalfile): # <---here
            print(line)
            results.append(line)

输出

Paris is a city in France

Amsterdam is in the Netherlands
[Finished in 0.0s]

修改

我不确定,但请查看

boundO = "(?<!is\s)\\b"
boundC = "(?=\s|$)"
#boundO + line in externalfile + boundC
#(regex rawtext regex)

#put every line of external file (c:\myexternal.txt) in list:
externalfile=[]
with open("check", 'r+') as file:
  for line in file:
      if line.strip():                 #if line != empty
          line=line.rstrip("\n")       #remove linebreaks
          #line=boundO + line + boundC  #add regex bounderies
          externalfile.append(line)

results = []
print(externalfile)
#check every line in c:\mycurrent.txt
with open("check.pl", 'r+') as file:
    for line in file:
        if any(re.search(boundO + ext + boundC, line) for ext in externalfile):
            print(line)
            results.append(line)

答案 1 :(得分:1)

正则表达式需要在使用之前进行编译。

ext in line 

只会测试是否可以在行

中找到字符串ext

您应该使用以下内容:

import re
regc=re.compile(ext)
regc.search(line)

答案 2 :(得分:1)

您必须使用re.search代替 compile 'com.android.support:appcompat-v7:23.3.0' compile 'com.android.support:design:23.3.0' compile 'com.squareup.retrofit2:retrofit:2.0.0-beta4' compile 'com.squareup.retrofit2:converter-gson:2.0.0-beta4' compile 'com.github.zzz40500:AndroidSweetSheet:1.1.0' compile 'com.github.ksoichiro:android-observablescrollview:1.5.0' compile 'com.googlecode.android-query:android-query:0.25.9' compile 'com.facebook.android:facebook-android-sdk:4.+' compile 'com.android.support:support-v4:23.3.0' compile 'com.baoyz.pullrefreshlayout:library:1.2.0' compile 'com.victor:lib:1.0.4' 运算符:

in

并且,为防止文件中的文本被解释为正则表达式,请使用re.escape

if any(re.search(ext, line) for ext in externalfile):