Question

我正在尝试在字符串中的正则表达式之前插入一个制表符（\ t）。在“ x 天前”之前，其中x是介于0-999之间的数字。

我看到的文字如下：

Great product, fast shipping! 22 days ago anon
Fast shipping. Got an extra free! Thanks! 42 days ago anon

期望的输出：

Great product, fast shipping! \t 22 days ago anon
Fast shipping. Got an extra free! Thanks! \t 42 days ago anon

我还是新手，我正在努力。我四处寻找答案，发现一些很接近，但没有一个是相同的。

这是我到目前为止所做的：

text = 'Great product, fast shipping! 22 days ago anon'
new_text = re.sub(r"\d+ days ago", "\t \d+", text)
print new_text

输出：

Great product, fast shipping!    \d+ anon

同样，我需要的是（注意\ t）：

Great product, fast shipping!    22 days ago anon

Answer 1

您可以在替换字符串中使用反向引用。在\d+ days ago周围放置一个parantheses，使其成为一个被捕获的组，并在替换中使用\\1来引用该组的文本：

>>> text = 'Great product, fast shipping! 22 days ago anon'
>>> new_text = re.sub(r"(\d+ days ago)", "\t\\1", text)
>>> print new_text
Great product, fast shipping!    22 days ago anon

Answer 2

您正在使用正则表达式模式替换，而您只需要\1反向引用。

为了在 n天前之前插入标签，您可以使用预测，并用\t\1替换捕获的数字：

import re
p = re.compile(ur'(\d+)(?=\s+days\s+ago)')
test_str = u"Great product, fast shipping! 22 days ago anon\nFast shipping. Got an extra free! Thanks! 42 days ago anon"
subst = u"\t\\1"
print re.sub(p, subst, test_str)

demo的结果：

Great product, fast shipping!   22 days ago anon
Fast shipping. Got an extra free! Thanks!   42 days ago anon

和sample program。

Answer 3

您可以使用前瞻进行零宽度插入，使用' '查找前导文字空间：

>>> import re
>>> txt='''\
... Great product, fast shipping! 22 days ago anon
... Fast shipping. Got an extra free! Thanks! 42 days ago anon'''
>>> repr(re.sub(r' (?=\d+)', ' \t', txt))
"'Great product, fast shipping! \\t22 days ago anon\\nFast shipping. Got an extra free! Thanks! \\t42 days ago anon'"

请注意，符合' \d+'的所有模式都会成为' \t\d+'，这是我认为您所追求的。

如果您想限制为' \d+ days ago''，只需将其添加到前瞻：

>>> txt='''\
... Great product, fast shipping! 22 days ago anon
... Fast shipping. Got an extra free! Thanks! 42 weeks ago anon'''
>>> repr(re.sub(r' (?=\d+ days ago)', ' \t', txt))
"'Great product, fast shipping! \\t22 days ago anon\\nFast shipping. Got an extra free! Thanks! 42 weeks ago anon'"

Answer 4

你可以使用

Tabindex = re.search(r"\d days ago",text).start()
text = text[0:Tabindex]+'\t'+text[Tabindex:len(text)]

在正则表达式之前拆分字符串

4 个答案: