如何使用python中的正则表达式匹配除以“ - ”的单词组

时间:2017-02-27 16:39:37

标签: python regex python-2.7

我正在使用python 2.7,我正在尝试将某个字符串与此结构匹配:

INPUT = 'abc-1-2 abc-2-3 abc-1-1 - TYP1 xyz-2-3 xyzzz - TYP2 ooop-1-1 abc-3-3 bbb - TYP3'

EXPECTED_OUTPUT = [
    'abc-1-2 abc-2-3 abc-1-1 - TYP1',
    'xyz-2-3 xyzzz - TYP2',
    'ooop-1-1 abc-3-3 bbb - TYP3']

这是我尝试的解决方案,但它不起作用: 的 Online Demo

5 个答案:

答案 0 :(得分:1)

我认为这就是你要找的东西:

".+?TYP\d+"

答案 1 :(得分:1)

以下 regex 应该这样做:

\b.*?-\s.*?(?:\s|$)

参见 demo / explanation

<强>蟒

import re
regex = ur"\b.*?-\s.*?(?:\s|$)"
str = "abc-1-2 abc-2-3 abc-1-1 - TYP1 xyz-2-3 xyzzz - TYP2 ooop-1-1 abc-3-3 bbb - TYP3"
matches = re.finditer(regex, str)
for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1
    print ("{match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

答案 2 :(得分:1)

>>> re.findall(r'\S.*? - \S+', INPUT)
['abc-1-2 abc-2-3 abc-1-1 - TYP1', 'xyz-2-3 xyzzz - TYP2', 'ooop-1-1 abc-3-3 bbb - TYP3']

说明:

'\S'  # any non-space character
'.*?' # (.) any character (*) zero or more times (?) non-greedy (match as few as possible)
' - ' # literal space dash space
'\S'  # any non-space character
'+'   # one or more times

答案 3 :(得分:0)

我唯一看到的是与中心相匹配的完整/破碎序列 [^\s-]+(?:-?[^\s-])*(?:\s+[^\s-]+(?:-?[^\s-])*)*\s+-\s+[^\s-]+

Online Demo

 [^\s-]+                # Unbroken sequence XXX-XXX-XXX
 (?:
      -?
      [^\s-] 
 )*

 (?:                    # Optional sequence  <space> XXX-XXX-XXX
      \s+ 
      [^\s-]+ 
      (?:
           -?
           [^\s-] 
      )*
 )*
                        # Broken sequence   <space> - <space> XXX
 \s+                    # Space
 -                      # Dash
 \s+                    # Space
 [^\s-]+                # XXX

输出

 **  Grp 0 -  ( pos 0 , len 30 ) 
abc-1-2 abc-2-3 abc-1-1 - TYP1  
 **  Grp 0 -  ( pos 31 , len 20 ) 
xyz-2-3 xyzzz - TYP2  
 **  Grp 0 -  ( pos 52 , len 27 ) 
ooop-1-1 abc-3-3 bbb - TYP3  

答案 4 :(得分:0)

最简单的是:

import re

string = "abc-1-2 abc-2-3 abc-1-1 - TYP1 xyz-2-3 xyzzz - TYP2 ooop-1-1 abc-3-3 bbb - TYP3"

rx = re.compile(r'(.+?TYP\d)\s*')
parts = rx.findall(string)
print(parts)
# ['abc-1-2 abc-2-3 abc-1-1 - TYP1', 'xyz-2-3 xyzzz - TYP2', 'ooop-1-1 abc-3-3 bbb - TYP3']