我必须在字符串中搜索带有数字作为前缀或后缀的单词(例如," abc21"或" 943xyz"然后,我需要将数字从字。
例如, " ABC12"必须转换为" abc 12" 或者" 12abc"必须转换为" 12 abc"
但是,如果数字位于字母之间,例如" a12bc",那么它应保持不变。我们应该怎么做?有没有比正则表达更简单的方法?
答案 0 :(得分:0)
像其中一种简单的东西。
所需要的只是用这些(?<! [\da-z] ) .. (?! [\da-z] )
来保护边界
做两件事:
- 它可以阻止引擎在相似类型(数字或字母)之间进行匹配
- 确保没有书挡类型。
方式1:
查找(?<![\da-z])(?:([a-z]+)(\d+)|(\d+)([a-z]+))(?![\da-z])
替换$1$3 $2$4
https://regex101.com/r/k4gNoE/1
(?<! [\da-z] )
(?:
( [a-z]+ ) # (1)
( \d+ ) # (2)
|
( \d+ ) # (3)
( [a-z]+ ) # (4)
)
(?! [\da-z] )
方式2:
查找(?<![\da-z])(?:([a-z]+(?=\d)|\d+(?=[a-z]))((?<=\d)[a-z]+|(?<=[a-z])\d+))(?![\da-z])
替换$1 $2
https://regex101.com/r/LbWnkg/1
(?<! [\da-z] )
(?:
( # (1 start)
[a-z]+
(?= \d )
| \d+
(?= [a-z] )
) # (1 end)
( # (2 start)
(?<= \d )
[a-z]+
| (?<= [a-z] )
\d+
) # (2 end)
)
(?! [\da-z] )
答案 1 :(得分:0)
你可以试试这个:
def split_vals(s):
return ' '.join(re.findall('^\d+|\d+$|^[a-zA-Z]\d+[a-zA-Z]+$|^[a-zA-Z]+$|[a-zA-Z]+', s))
s = ["abc21", "943xyz", '12abc', "a12bc"]
new_s = list(map(split_vals, s))
输出:
['abc 21', '943 xyz', '12 abc', 'a12bc']
答案 2 :(得分:0)
您可以使用re.sub
插入该空格:
re.sub(r'\b(?:(\D+)(\d+)|(\d+)(\D+))\b', r"\1\3 \2\4", word)
这匹配数字后跟非数字,反之亦然。
\b
边界确保单词完整匹配,因此我们不会匹配单词中间的数字。
替换模式\1\3 \2\4
利用了不匹配的组被空字符串替换的事实。我们知道 组1和2 或组3和4将匹配,其他组将为空,因此\1\3 \2\4
将始终生成有效结果(不重复输入的任何部分)。
示例:
>>> re.sub(r'\b(?:(\D+)(\d+)|(\d+)(\D+))\b', r"\1\3 \2\4", "abc12")
'abc 12'
>>> re.sub(r'\b(?:(\D+)(\d+)|(\d+)(\D+))\b', r"\1\3 \2\4", "12abc")
'12 abc'
>>> re.sub(r'\b(?:(\D+)(\d+)|(\d+)(\D+))\b', r"\1\3 \2\4", "a12bc")
'a12bc'