我有一个这样的字符串:
This changes are related to book:id:pages:3000 location /file1/file2/file3/pages.000.zip
This changes are related to book:id:pages:30ab00e location /file1/file2/file3/pages.000.zip
在此我想用“my_doc”替换一个或多个数字(有时数字也是十六进制) 我尝试过:
match = re.findall("[\.0-9]*",text)
print match
但它只适用于数字或数字,它也适用于十六进制数字并用“my_doc”替换数字并打印整行 输出:
This changes are related to book:id:pages:my_doc location /file1/file2/file3/pages.my_doc.zip
This changes are related to book:id:pages:my_doc location /file1/file2/file3/pages.my_doc.zip
答案 0 :(得分:1)
In [8]: import re
In [14]: strs="This changes are related to book:id:pages:3000 location /file1/file2/file3/pages.000.zip"
In [15]: re.findall(r"\d+[A-Ea-e]{0,}\d+[A-Ea-e]{0,}",strs)
Out[15]: ['3000', '000']
In [16]: strs1="This changes are related to book:id:pages:30ab00e location /file1/file2/file3/pages.000.zip"
In [17]: re.findall(r"\d+[A-Ea-e]{0,}\d+[A-Ea-e]{0,}",strs1)
Out[17]: ['30ab00e', '000']
使用re.sub()
替换:
In [68]: strs="This changes are related to book:id:pages:3000 location /file1/file2/file3/pages.000.zip"
In [69]: re.sub(r"(\d+[A-Ea-e]*\d+[A-Ea-e]*)|(\d+)","my_doc",strs)
Out[69]: 'This changes are related to book:id:pages:my_doc location /filemy_doc/filemy_doc/filemy_doc/pages.my_doc.zip'
In [70]: strs1="This changes are related to book:id:pages:30ab00e location /file1/file2/file3/pages.000.zip"
In [71]: re.sub(r"(\d+[A-Ea-e]*\d+[A-Ea-e]*)|(\d+)","my_doc",strs1)
Out[71]: 'This changes are related to book:id:pages:my_doc location /filemy_doc/filemy_doc/filemy_doc/pages.my_doc.zip'
In [72]: foo=" number of pages completed, 2 still pending"
In [73]: re.sub(r"(\d+[A-Ea-e]*\d+[A-Ea-e]*)|(\d+)","my_doc",foo)
Out[73]: ' number of pages completed, my_doc still pending'
答案 1 :(得分:0)
答案 2 :(得分:0)
这很疯狂(所以你的问题)和hackish!
十六进制字符(a-z,A-Z)出现在字符串中的许多位置,因此会被替换掉(认为问题不对象atm;))似乎不是预期的行为。
假设要删除的blob /部分是十六进制字,并假设它的最小长度为3,请考虑:
import re
from string import hexdigits
str_1 = "This changes are related to book:id:pages:3000 location /file1/file2/file3/pages.000.zip"
str_2 = "This changes are related to book:id:pages:30ab00e location /file1/file2/file3/pages.000.zip"
expression = '[%s]{3,}'%(string.hexdigits) # = '[' + hexdigits + ']{3,}'
re.sub(exp, 'my_doc', str_1)
编辑:好一点疯狂正则表达式,使用以下表达式
expression = ':[%s]+\S'%(hexdigits)
这将仅匹配十六进制字,因此十六进制+数字的长度不再是约束。