通过使用python正则表达式,如何删除数字后的单位字?
e.g。
units = ['in', 'ft']
'12in desk' becomes '12 desk'
'12 in desk' becomes '12 desk'
'abc 20 ft long' becomes 'abc 20 long'
答案 0 :(得分:3)
这是一种方法,以编程方式从units
列表构建正则表达式:
import re
units = ['in', 'ft']
tests = ['12in desk', '12 in desk', 'abc 20 ft long', ]
expecteds = ['12 desk', '12 desk', 'abc 20 long', ]
regexp = re.compile(r'(\d+)\s*(%s)\b' % '|'.join(units))
for test, expected in zip(tests, expecteds):
actual = re.sub(regexp, r'\1', test)
assert actual == expected
答案 1 :(得分:1)
这是另一种方式,类似于@Rob的回答,只是有点不同。我的方法的不同之处在于,不是使用re.sub
方法,而是简单地捕获所有相关组,然后将字符串重新组合在一起,省略包含有问题文本的第3组。
import re
units = '|'.join(['in', 'ft'])
vals = ['12in desk', '12 in desk', 'abc 20 ft long']
pattern = r'([^\d]*)(\d+)\s?({})(.*)'.format(units)
regex = re.compile(pattern)
for val in vals:
match = regex.match(val)
out = ''.join(match.group(1,2,4))
print("{} becomes in {}".format(val, out))
答案 2 :(得分:0)
使用以下代码,您可以在编号后删除该单位。这是@ wesanyer的替代品。
import re
units = '|'.join(['in','ft'])
pattern = "[0-9]+"+".*"+units
a = "12in desk"
match = re.search(pattern, "12in desk")
if match:
a.replace(match.group(1), "")