以下是示例文本:
initiated to address the deviation to SOP-020583v11.0 Section SOP-016248v2.0 john doe, john doe SOP-020583 fake text, this is all fake
理想情况下,文字看起来像:
initiated to address the deviation to 020583 Section 016248 john doe, john doe 020583 fake text, this is all fake
这是我到目前为止的代码:
def dashrepl(matchobj):
print (type(matchobj))
return re.findall('[0-9]',matchobj)
re.sub(SOP, dashrepl, long_desc_text[22])
但我收到以下错误:
TypeError: expected string or buffer
修改更新内容:
long_desc_text[22]
SOP-020583v11.0 Section 8.4.On 17Jan2016 at ATO Site, SOP-016248v2.0 was due for periodic review but the periodic SOP-016248 revision is not tied to any change control records. SOP-020583 tied to a change control record" and notified ID63718 notifiedID22359 of the event. SOP-020583v11.0, fake text fake text
答案 0 :(得分:1)
所以,这是我的代码:
import re
test = "initiated to address the deviation to SOP-020583v11.0 Section SOP-016248v2.0 john doe, john doe SOP-020583 fake text, this is all fake"
regexp = r"SOP-(\d+)(?:v\d+\.\d)?"
test = re.subn(regexp, r"\1", test)
print test[1]
它产生:
“发起以解决偏向020583第016248节约翰·多伊,约翰·多伊020583假文,这都是假的”
使用python re函数“subn”查找并替换具有指定字符串的模式的所有示例 - 在本例中为第一个捕获组。字符串前面的“r”将其指定为正则表达式对象。
作为参考,我也发现了这个link
希望这有帮助。