在Python中将单词与数字分开

时间:2018-01-25 21:16:13

标签: python regex

我必须在字符串中搜索带有数字作为前缀或后缀的单词(例如," abc21"或" 943xyz"然后,我需要将数字从字。

例如, " ABC12"必须转换为" abc 12" 或者" 12abc"必须转换为" 12 abc"

但是,如果数字位于字母之间,例如" a12bc",那么它应保持不变。我们应该怎么做?有没有比正则表达更简单的方法?

3 个答案:

答案 0 :(得分:0)

像其中一种简单的东西。
所需要的只是用这些(?<! [\da-z] ) .. (?! [\da-z] )来保护边界 做两件事:
- 它可以阻止引擎在相似类型(数字或字母)之间进行匹配 - 确保没有书挡类型。

方式1:

查找(?<![\da-z])(?:([a-z]+)(\d+)|(\d+)([a-z]+))(?![\da-z])
替换$1$3 $2$4

https://regex101.com/r/k4gNoE/1

 (?<! [\da-z] )
 (?:
      ( [a-z]+ )             # (1)
      ( \d+ )                # (2)
   |  
      ( \d+ )                # (3)
      ( [a-z]+ )             # (4)
 )
 (?! [\da-z] )

方式2:

查找(?<![\da-z])(?:([a-z]+(?=\d)|\d+(?=[a-z]))((?<=\d)[a-z]+|(?<=[a-z])\d+))(?![\da-z]) 替换$1 $2

https://regex101.com/r/LbWnkg/1

 (?<! [\da-z] )
 (?:
      (                        # (1 start)
           [a-z]+ 
           (?= \d )
        |  \d+ 
           (?= [a-z] )
      )                        # (1 end)
      (                        # (2 start)
           (?<= \d )
           [a-z]+ 
        |  (?<= [a-z] )
           \d+ 
      )                        # (2 end)
 )
 (?! [\da-z] )

答案 1 :(得分:0)

你可以试试这个:

def split_vals(s):
  return ' '.join(re.findall('^\d+|\d+$|^[a-zA-Z]\d+[a-zA-Z]+$|^[a-zA-Z]+$|[a-zA-Z]+', s))
s = ["abc21", "943xyz", '12abc', "a12bc"]
new_s = list(map(split_vals, s))

输出:

['abc 21', '943 xyz', '12 abc', 'a12bc']

答案 2 :(得分:0)

您可以使用re.sub插入该空格:

re.sub(r'\b(?:(\D+)(\d+)|(\d+)(\D+))\b', r"\1\3 \2\4", word)

这匹配数字后跟非数字,反之亦然。

\b边界确保单词完整匹配,因此我们不会匹配单词中间的数字。

替换模式\1\3 \2\4利用了不匹配的组被空字符串替换的事实。我们知道 组1和2 组3和4将匹配,其他组将为空,因此\1\3 \2\4将始终生成有效结果(不重复输入的任何部分)。

示例:

>>> re.sub(r'\b(?:(\D+)(\d+)|(\d+)(\D+))\b', r"\1\3 \2\4", "abc12")
'abc 12'
>>> re.sub(r'\b(?:(\D+)(\d+)|(\d+)(\D+))\b', r"\1\3 \2\4", "12abc")
'12 abc'
>>> re.sub(r'\b(?:(\D+)(\d+)|(\d+)(\D+))\b', r"\1\3 \2\4", "a12bc")
'a12bc'