Question

我的问题是当我使用python搜索pdf文件时。我逐行搜索它，所以假设我有一行包含：

＆＃34;这就是这个％＆＃34;

所以，如果我们放 x =＆＃34;这就是这个％＆＃34; ，我想计算＆＃34;这个＆＃34;而忽略了收益＆＃34;％＆＃34;因为这是一个评论。代码是：

if re.search("%",x):
    new_line = x.split()
    for g in new_line:
        if re.search("%",g):
            break
        elif g == "this":
            counter = counter+1
    print (counter)

但如果我有以下内容怎么办？

x =＆＃34;这个这个％％this this＆＃34;第二个百分比结束评论，我想跳过＆＃34;这个＆＃34;介于＆＃34;％＆＃34;之间并计算最后一个

有任何想法吗？

Answer 1

你可以尝试

x = re.sub("%[^%]*%?", "", x);

演示：http://regex101.com/r/tE6rL7

Answer 2

data = "this this this %this %this"

data = ' '.join(data.split('%')[::2])

data # => "this this this  this"

使用python搜索文件

2 个答案: