Question

我有这样的事情：

Othername California (2000) (T) (S) (ok) {state (#2.1)}

是否需要获取正则表达式代码：

Othername California ok 2.1

即。我想将数字保持在圆括号内，而这些数字依次在{}内，并保持文本“ok”在（）内。我特别需要字符串“ok”打印出来，如果包含在我的行中，但我想摆脱括号内的其他文本，例如（V），（S）或（2002）。

我知道可能正则表达式不是解决此类问题的最有效方法。

任何帮助都将不胜感激。

编辑：

字符串可能会有所不同，因为如果某些信息不可用，则不包含在该行中。文本本身也是可变的（例如，每行都没有“状态”）。例如，人们可以拥有：

Name1 Name2 Name3 (2000) (ok) {edu (#1.1)}
Name1 Name2 (2002) {edu (#1.1)}
Name1 Name2 Name3 (2000) (V) {variation (#4.12)}

Answer 1

正则表达式

(.+)\s+\(\d+\).+?(?:\(([^)]{2,})\)\s+(?={))?\{.+\(#(\d+\.\d+)\)\}

Regular expression image

用于测试的文本

Name1 Name2 Name3 (2000) {Education (#3.2)}
Name1 Name2 Name3 (2000) (ok) {edu (#1.1)}
Name1 Name2 (2002) {edu (#1.1)}
Name1 Name2 Name3 (2000) (V) {variation (#4.12)}
Othername California (2000) (T) (S) (ok) {state (#2.1)}

测试

>>> regex = re.compile("(.+)\s+\(\d+\).+?(?:\(([^)]{2,})\)\s+(?={))?\{.+\(#(\d+\.\d+)\)\}")
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0x54e2105f36c16a48>
>>> regex.match(string)
<_sre.SRE_Match object at 0x54e2105f36c169e8>

# Run findall
>>> regex.findall(string)
[
   (u'Name1 Name2 Name3'   , u''  , u'3.2'),
   (u'Name1 Name2 Name3'   , u'ok', u'1.1'),
   (u'Name1 Name2'         , u''  , u'1.1'),
   (u'Name1 Name2 Name3'   , u''  , u'4.12'),
   (u'Othername California', u'ok', u'2.1')
]

Answer 2

试试这个：

import re

thestr = 'Othername California (2000) (T) (S) (ok) {state (#2.1)}'

regex = r'''
    ([^(]*)             # match anything but a (
    \                   # a space
    (?:                 # non capturing parentheses
        \([^(]*\)       # parentheses
        \               # a space
    ){3}                # three times
    \(([^(]*)\)         # capture fourth parentheses contents
    \                   # a space
    {                   # opening {
        [^}]*           # anything but }
        \(\#            # opening ( followed by #
            ([^)]*)     # match anything but )
        \)              # closing )
    }                   # closing }
'''

match = re.match(regex, thestr, re.X)

print match.groups()

输出：

('Othername California', 'ok', '2.1')

这是压缩版本：

import re

thestr = 'Othername California (2000) (T) (S) (ok) {state (#2.1)}'
regex = r'([^(]*) (?:\([^(]*\) ){3}\(([^(]*)\) {[^}]*\(\#([^)]*)\)}'
match = re.match(regex, thestr)

print match.groups()

Answer 3

尽管我在评论中已经说过了。我找到了解决方法：

(?(?=\([^()\w]*[\w.]+[^()\w]*\))\([^()\w]*([\w.]+)[^()\w]*\)|.)(?=[^{]*\})|(?<!\()(\b\w+\b)(?!\()|ok

<强>解释

(?                                  # If
(?=\([^()\w]*[\w.]+[^()\w]*\))      # There is (anything except [()\w] zero or more times, followed by [\w.] one or more times, followed by anything except [()\w] zero or more times)
\([^()\w]*([\w.]+)[^()\w]*\)        # Then match it, and put [\w.] in a group
|                                   # else
.                                   # advance with one character
)                                   # End if
(?=[^{]*\})                         # Look ahead if there is anything except { zero or more times followed by }

|                                   # Or
(?<!\()(\b\w+\b)(?!\()              # Match a word not enclosed between parenthesis
|                                   # Or
ok                                  # Match ok

Online demo

Answer 4

其他情况是：

^(\w+\s?\w+)\s?\(\d{1,}\)\s?\(\w+\)\s?\(\w+\)\s?\((\w+)\)\s?.*#(\d.\d)

在python中使用正则表达式嵌套括号

4 个答案:

正则表达式

用于测试的文本

测试