我正在尝试在字符串中的正则表达式之前插入一个制表符(\ t)。在“ x 天前”之前,其中x是介于0-999之间的数字。
我看到的文字如下:
Great product, fast shipping! 22 days ago anon
Fast shipping. Got an extra free! Thanks! 42 days ago anon
期望的输出:
Great product, fast shipping! \t 22 days ago anon
Fast shipping. Got an extra free! Thanks! \t 42 days ago anon
我还是新手,我正在努力。我四处寻找答案,发现一些很接近,但没有一个是相同的。
这是我到目前为止所做的:
text = 'Great product, fast shipping! 22 days ago anon'
new_text = re.sub(r"\d+ days ago", "\t \d+", text)
print new_text
输出:
Great product, fast shipping! \d+ anon
同样,我需要的是(注意\ t):
Great product, fast shipping! 22 days ago anon
答案 0 :(得分:3)
您可以在替换字符串中使用反向引用。在\d+ days ago
周围放置一个parantheses,使其成为一个被捕获的组,并在替换中使用\\1
来引用该组的文本:
>>> text = 'Great product, fast shipping! 22 days ago anon'
>>> new_text = re.sub(r"(\d+ days ago)", "\t\\1", text)
>>> print new_text
Great product, fast shipping! 22 days ago anon
答案 1 :(得分:1)
您正在使用正则表达式模式替换,而您只需要\1
反向引用。
为了在 n天前之前插入标签,您可以使用预测,并用\t\1
替换捕获的数字:
import re
p = re.compile(ur'(\d+)(?=\s+days\s+ago)')
test_str = u"Great product, fast shipping! 22 days ago anon\nFast shipping. Got an extra free! Thanks! 42 days ago anon"
subst = u"\t\\1"
print re.sub(p, subst, test_str)
demo的结果:
Great product, fast shipping! 22 days ago anon
Fast shipping. Got an extra free! Thanks! 42 days ago anon
答案 2 :(得分:1)
您可以使用前瞻进行零宽度插入,使用' '
查找前导文字空间:
>>> import re
>>> txt='''\
... Great product, fast shipping! 22 days ago anon
... Fast shipping. Got an extra free! Thanks! 42 days ago anon'''
>>> repr(re.sub(r' (?=\d+)', ' \t', txt))
"'Great product, fast shipping! \\t22 days ago anon\\nFast shipping. Got an extra free! Thanks! \\t42 days ago anon'"
请注意,符合' \d+'
的所有模式都会成为' \t\d+'
,这是我认为您所追求的。
如果您想限制为' \d+ days ago''
,只需将其添加到前瞻:
>>> txt='''\
... Great product, fast shipping! 22 days ago anon
... Fast shipping. Got an extra free! Thanks! 42 weeks ago anon'''
>>> repr(re.sub(r' (?=\d+ days ago)', ' \t', txt))
"'Great product, fast shipping! \\t22 days ago anon\\nFast shipping. Got an extra free! Thanks! 42 weeks ago anon'"
答案 3 :(得分:0)
你可以使用
Tabindex = re.search(r"\d days ago",text).start()
text = text[0:Tabindex]+'\t'+text[Tabindex:len(text)]