好的,所以我的输入数据看起来像这样:
talk.politics.guns a:11 about:2 absurd:1 again:1 an:1 and:5 any:2 approaching:1 are:5 argument:1 etc...
我希望对其进行二值化并获得如下输出:
talk.politics.guns a:1 about:1 absurd:1 again:1 an:1 and:1 any:1 approaching:1 are:1 argument:1 etc...
然而,我运行我的代码我以某种方式插入冒号后的空格:
talk.politics.guns a: 1 about: 1 absurd: 1 again: 1 an: 1 and: 1 any: 1 approaching: 1 are: 1 argument: 1 etc...
如何摆脱这个空间?
继承我的代码:
import sys
import re
input_file = sys.argv[1]
input_file = open(input_file, 'r')
binary = re.compile(r"([:])([0-9]+)")
line = input_file.readline()
while(line):
line = binary.sub(r"\1 1", line);
print line
line = input_file.readline()
答案 0 :(得分:0)
您不需要捕获刚刚退出数字的:
冒号。
(?<=:)
肯定的后视,断言匹配必须以:
开头。 \d+
会匹配一个或多个数字。
binary = re.compile(r"(?<=:)\d+")
line = input_file.readline()
while(line):
line = binary.sub(r"1", line);
print line
line = input_file.readline()
答案 1 :(得分:0)
使用lookbehind断言并且不使用while循环来读取文件:
import sys
import re
input_file = sys.argv[1]
with open(input_file, 'r') as input_file:
binary = re.compile(r"(?<=:)\d+")
for line in input_file:
print(binary.sub(r"1",line))