Question

以下是示例文本：

initiated to address the deviation to SOP-020583v11.0 Section SOP-016248v2.0 john doe, john doe SOP-020583 fake text, this is all fake

理想情况下，文字看起来像：

initiated to address the deviation to 020583 Section 016248 john doe, john doe 020583 fake text, this is all fake

这是我到目前为止的代码：

def dashrepl(matchobj):
    print (type(matchobj))
    return re.findall('[0-9]',matchobj)

re.sub(SOP, dashrepl, long_desc_text[22])

但我收到以下错误：

TypeError: expected string or buffer

修改更新内容：

long_desc_text[22]

SOP-020583v11.0 Section 8.4.On 17Jan2016 at ATO Site, SOP-016248v2.0 was due for periodic review but the periodic SOP-016248 revision is not tied to any change control records. SOP-020583 tied to a change control record" and notified ID63718 notifiedID22359 of the event. SOP-020583v11.0, fake text fake text

Answer 1

所以，这是我的代码：

import re

test = "initiated to address the deviation to SOP-020583v11.0 Section SOP-016248v2.0 john doe, john doe SOP-020583 fake text, this is all fake"

regexp = r"SOP-(\d+)(?:v\d+\.\d)?"

test = re.subn(regexp, r"\1", test)

print test[1]

它产生：
“发起以解决偏向020583第016248节约翰·多伊，约翰·多伊020583假文，这都是假的”

使用python re函数“subn”查找并替换具有指定字符串的模式的所有示例 - 在本例中为第一个捕获组。字符串前面的“r”将其指定为正则表达式对象。

作为参考，我也发现了这个link

希望这有帮助。

在提出一般表达式时，只想提取数字

1 个答案: