在正则表达式之前拆分字符串

时间:2015-04-25 19:30:51

标签: regex python-2.7 split

我正在尝试在字符串中的正则表达式之前插入一个制表符(\ t)。在“ x 天前”之前,其中x是介于0-999之间的数字。

我看到的文字如下:

Great product, fast shipping! 22 days ago anon
Fast shipping. Got an extra free! Thanks! 42 days ago anon

期望的输出:

Great product, fast shipping! \t 22 days ago anon
Fast shipping. Got an extra free! Thanks! \t 42 days ago anon

我还是新手,我正在努力。我四处寻找答案,发现一些很接近,但没有一个是相同的。

这是我到目前为止所做的:

text = 'Great product, fast shipping! 22 days ago anon'
new_text = re.sub(r"\d+ days ago", "\t \d+", text)
print new_text

输出:

Great product, fast shipping!    \d+ anon

同样,我需要的是(注意\ t):

Great product, fast shipping!    22 days ago anon

4 个答案:

答案 0 :(得分:3)

您可以在替换字符串中使用反向引用。在\d+ days ago周围放置一个parantheses,使其成为一个被捕获的组,并在替换中使用\\1来引用该组的文本:

>>> text = 'Great product, fast shipping! 22 days ago anon'
>>> new_text = re.sub(r"(\d+ days ago)", "\t\\1", text)
>>> print new_text
Great product, fast shipping!    22 days ago anon

答案 1 :(得分:1)

您正在使用正则表达式模式替换,而您只需要\1反向引用。

为了在 n天前之前插入标签,您可以使用预测,并用\t\1替换捕获的数字:

import re
p = re.compile(ur'(\d+)(?=\s+days\s+ago)')
test_str = u"Great product, fast shipping! 22 days ago anon\nFast shipping. Got an extra free! Thanks! 42 days ago anon"
subst = u"\t\\1"
print re.sub(p, subst, test_str)

demo的结果:

Great product, fast shipping!   22 days ago anon
Fast shipping. Got an extra free! Thanks!   42 days ago anon

sample program

答案 2 :(得分:1)

您可以使用前瞻进行零宽度插入,使用' '查找前导文字空间:

>>> import re
>>> txt='''\
... Great product, fast shipping! 22 days ago anon
... Fast shipping. Got an extra free! Thanks! 42 days ago anon'''
>>> repr(re.sub(r' (?=\d+)', ' \t', txt))
"'Great product, fast shipping! \\t22 days ago anon\\nFast shipping. Got an extra free! Thanks! \\t42 days ago anon'"

请注意,符合' \d+'的所有模式都会成为' \t\d+',这是我认为您所追求的。

如果您想限制为' \d+ days ago'',只需将其添加到前瞻:

>>> txt='''\
... Great product, fast shipping! 22 days ago anon
... Fast shipping. Got an extra free! Thanks! 42 weeks ago anon'''
>>> repr(re.sub(r' (?=\d+ days ago)', ' \t', txt))
"'Great product, fast shipping! \\t22 days ago anon\\nFast shipping. Got an extra free! Thanks! 42 weeks ago anon'"

答案 3 :(得分:0)

你可以使用

Tabindex = re.search(r"\d days ago",text).start()
text = text[0:Tabindex]+'\t'+text[Tabindex:len(text)]