查找括号内容

时间:2014-11-08 20:52:48

标签: python regex

我有几千个文本块,可能包含也可能没有记录中的人的死亡日期,其总是采用以下形式:

(d. xxxxxxxxxxxxx)

它以括号开头,后跟d.,然后是一些日期文本,并以最后一个括号结束。

我用一些测试样本编写了以下代码来测试我写的正则表达式:

import re
tests = ["Milt Jackson, vibraphone, piano, guitar, 1923 (d. October 9, 1999)", "Howard Johnson, alto sax, 1908 (d. December 28, 1991)","Sonny Greenwich, guitar, 1936", "Eiichi Hayashi, alto sax, 1960", "Yoshio Ikeda, bass, 1942", "Urs Leimgruber, saxophones, bass clarinet. 1952"]

for test in tests:
    m = re.match ("\(d.(.*)\)", test)
    if m:
        print(m.groups())

然而它没有打印结果。

我在在线Regex测试仪中测试了Regex,它适用于有效的测试输入。

所以,我猜我的代码是错误的。有谁能建议为什么,拜托?

最后 - 我想要提取的是死亡日期本身(不是括号和d.) - 我可以做任何建议吗?

3 个答案:

答案 0 :(得分:3)

re.match始终匹配字符串的开头。来自docs

  

re.match(pattern, string, flags=0)

     

如果string 开头的零个或多个字符与正则表达式pattern匹配,则返回相应的匹配对象。

强调我的。

您需要使用re.search让Python 搜索获取字符串中任何位置的模式:

>>> import re
>>> tests = ["Milt Jackson, vibraphone, piano, guitar, 1923 (d. October 9, 1999)", "Howard Johnson, alto sax, 1908 (d. December 28, 1991)","Sonny Greenwich, guitar, 1936", "Eiichi Hayashi, alto sax, 1960", "Yoshio Ikeda, bass, 1942", "Urs Leimgruber, saxophones, bass clarinet. 1952"]
>>>
>>> for test in tests:
...     m = re.search ("\(d\.(.*)\)", test)
...     if m:
...         print(m.groups())
...
(' October 9, 1999',)
(' December 28, 1991',)
>>>

此外,在您的模式中,我在.之后转义d以使Python与文字句点匹配。否则,Python将匹配那里的任何字符(换行符除外)。

答案 1 :(得分:1)

使用search代替match

for test in tests:
...     m = re.search ("\(d.(.*)\)", test)
...     if m:
...         print(m.groups())
... 
(' October 9, 1999',)
(' December 28, 1991',)

为什么match无效?

Tha match在字符串的开头搜索模式。在测试字符串中,匹配的部分不在字符串的开头,因此match失败。 search在字符串中的任何位置搜索模式的位置。

  • re.search(pattern, string, flags=0)

    扫描字符串,查找正则表达式模式生成匹配项的第一个位置,并返回相应的MatchObject实例。如果字符串中没有位置与模式匹配,则返回None;

答案 2 :(得分:0)

考虑到总是以(d.xxxxxxxxxxxxx)的形式出现,你的正则表达式和所提供的答案会以(r. then anything)的格式捕获任何内容,除非你遇到(r. followed a space的情况1}}并且没有关闭paren然后您可以在没有正则表达式的情况下执行此操作:

tests = ["Milt Jackson, vibraphone, piano, guitar, 1923 (d. October 9, 1999)", "Howard Johnson, alto sax, 1908 (d. December 28, 1991)","Sonny Greenwich, guitar, 1936", "Eiichi Hayashi, alto sax, 1960", "Yoshio Ikeda, bass, 1942", "Urs Leimgruber, saxophones, bass clarinet. 1952"]
for line in tests:
    if "(d." in line:
        spl = line.split("(d. ")[1]
        print(spl[:spl.find(")")])

 October 9, 1999
 December 28, 1991