Question

我正在尝试读取存储在文件中的2个标签之间的内容，内容可能跨越多行。标签可以在文件中出现0或1次。

例如：文件内容可以是

title:Corruption Today: Corruption today in
content:Corruption Today: 
Corruption today in 
score:0.91750675

因此，在阅读“内容：”时，我的查询应该会导致“今天的腐败：今天的腐败”。经过一些谷歌搜索后，我能够编写以下代码

myfile = open(files,'r');
filecontent = myfile.read();

startPtrs = [m.start()+8 for m in re.finditer('content:', filecontent)];
startPtr = startPtrs[0];
endPtrs = [m.start()-1 for m in re.finditer('score:', filecontent)];
endPtr = endPtrs[0];

content = filecontent[startPtr:endPtr];

我不确定上面代码的效率是多少，因为我们在文件内容中迭代2次以检索内容。可以做一些更高效的事情。

Answer 1

如果你想找到一个字符串beetwen 2个子串，你可以使用re moudle：

import re

myfile = open(files,'r');
filecontent = myfile.read();

results = re.compile('content(.*?)score', re.DOTALL | re.IGNORECASE).findall(filecontent)
print results

一些解释：

来自docs的

IGNORECASE：

执行不区分大小写的匹配;像[A-Z]这样的表达式也会匹配小写字母。这不受当前区域设置的影响。

来自文档的

DOTALL：

(Dot.) In the default mode, this matches any character except a newline. If the DOTALL flag has been specified, this matches any character including a newline.

Compile您可以看到here

您还可以看到其他一些解决方案here

Python在2个标签之间查找字符串

1 个答案: