Question

我有一个单词列表，其中一些是单个单词，一些是多个单词，单词可能有也可能没有数字字符。

一个例子 -

word_list=['word', 'kap1','another word', 'another-1 word', 'another word 1']

我想在列表中识别单词条目 -

alphabets*Junction*digit(s)

其中 Junction 可以是空格，也可以是连字符。例如，在上面的列表中，kap1符合条件（并且没有其他条目）。现在，找到此条目后，我想创建此条目的变体（基于联结），并将它们添加到列表中。

例如，找到kap1后，我想将kap 1和kap-1添加到列表中。

我能够编写初始正则表达式来识别条目 -

Word_NumberRegex=re.compile(r"^[a-zA-Z]+[ -]?\d+$")

但是我想知道一个很好的算法来创建变体，具体取决于结点。

Answer 1

使用re，您可以捕获匹配的模式，并使用自定义分隔符重新格式化：

word_list=['word', 'kap1','another word', 'another-1 word', 'another word 1']

import re    
p = r'([a-zA-Z]+)[- ]?([0-9]+)'
[re.sub(p, r'\1{}\2'.format(sep), w) for w in word_list if re.fullmatch(p, w) for sep in ['', ' ', '-']]

# ['kap1', 'kap 1', 'kap-1']

预编译模式：

p = re.compile(r'([a-zA-Z]+)[- ]?([0-9]+)')
[p.sub(r'\1{}\2'.format(sep), w) for w in word_list if p.fullmatch(w) for sep in ['', ' ', '-']]

# ['kap1', 'kap 1', 'kap-1']

Answer 2

您可以使用3个捕获组并使用中间组捕获联结字符。使用结字符搜索分隔符列表并获得所需的输出：

import re

word_list=['word', 'kap1', 'another word', 'abc-123', 'another-1 word', 'another word 1']

reg = re.compile(r'^([a-zA-Z]+)([- ]?)([0-9]+)$')

for w in word_list:
   m = reg.match(w)
   if m:
      result = []
      seps = ['', ' ', '-']
      seps.remove(m.group(2))
      for s in seps:
         result += [m.group(1) + s + m.group(3)]
      print result

<强>输出：

['kap 1', 'kap-1']
['abc123', 'abc 123']

Code Demo

根据正则表达式中的字母数字连接制作变体

2 个答案: