我的正则表达式中的额外空格

时间:2015-01-18 19:17:23

标签: python regex

好的,所以我的输入数据看起来像这样:

talk.politics.guns a:11 about:2 absurd:1 again:1 an:1 and:5 any:2 approaching:1 are:5 argument:1 etc...

我希望对其进行二值化并获得如下输出:

talk.politics.guns a:1 about:1 absurd:1 again:1 an:1 and:1 any:1 approaching:1 are:1 argument:1 etc...

然而,我运行我的代码我以某种方式插入冒号后的空格:

talk.politics.guns a: 1 about: 1 absurd: 1 again: 1 an: 1 and: 1 any: 1 approaching: 1 are: 1 argument: 1 etc...

如何摆脱这个空间?

继承我的代码:

import sys
import re

input_file = sys.argv[1]
input_file = open(input_file, 'r')

binary = re.compile(r"([:])([0-9]+)")
line = input_file.readline()

while(line):
    line = binary.sub(r"\1 1", line);
    print line
    line = input_file.readline()

2 个答案:

答案 0 :(得分:0)

您不需要捕获刚刚退出数字的:冒号。  (?<=:)肯定的后视,断言匹配必须以:开头。 \d+会匹配一个或多个数字。

binary = re.compile(r"(?<=:)\d+")
line = input_file.readline()
while(line):
    line = binary.sub(r"1", line);
    print line
    line = input_file.readline()

答案 1 :(得分:0)

使用lookbehind断言并且不使用while循环来读取文件:

import sys
import re

input_file = sys.argv[1]
with open(input_file, 'r') as input_file:
    binary = re.compile(r"(?<=:)\d+")
    for line in input_file:
        print(binary.sub(r"1",line))