Python正则表达式提取日期模式

时间:2016-12-29 01:07:00

标签: python regex python-2.7

我有一个表格的句子(关键字后跟左括号后跟任意字符串后跟2个以连字符分隔的日期):

Mohandas Karamchand Gandhi (/ˈɡɑːndi, ˈɡæn-/; Hindustani: [ˈmoːɦənd̪aːs ˈkərəmtʃənd̪ ˈɡaːnd̪ʱi]; 2 October 1869 – 30 January 1948) was the preeminent leader of the Indian independence movement in British-ruled India.

我需要使用正则表达式从这句话中提取出生日期(1869年10月2日)和死亡日期(1948年1月30日)。我已经写了正则表达式来提取日期模式。

date_pattern="(\d{1,2}(\s|-|/)?(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May?|June?|July?|Aug(ust)?|Sep(t(ember)?)?|Oct(ober)?|Nov(ember)?|Dec(ember)?|\d{1,2})(\s|-|/)?\d{2,4})"

我需要提取上述形式的句子,并分别打印出生日期和死亡日期。

2 个答案:

答案 0 :(得分:1)

import re

text = '''Mohandas Karamchand Gandhi (/ˈɡɑːndi, ˈɡæn-/; Hindustani: [ˈmoːɦənd̪aːs ˈkərəmtʃənd̪ ˈɡaːnd̪ʱi]; 2 October 1869 – 30 January 1948) was the preeminent leader of the Indian independence movement in British-ruled India.'''
birth, death = re.findall(r'\d+[ \d\w]+', text)
print(birth)
print(death)

出:

2 October 1869 
30 January 1948

答案 1 :(得分:0)

import re
date_pattern="(\d{1,2}(?:\s|-|/)?(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May?|June?|July?|Aug(?:ust)?|Sep(?:t(?:ember)?)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?|\d{1,2})(?:\s|-|/)?\d{2,4})"

bio = "Mohandas Karamchand Gandhi (/ˈɡɑːndi, ˈɡæn-/; Hindustani: [ˈmoːɦənd̪aːs ˈkərəmtʃənd̪ ˈɡaːnd̪ʱi]; 2 October 1869 – 30 January 1948) was the preeminent leader of the Indian independence movement in British-ruled India."

matches = re.findall(date_pattern, bio)
if matches and len(matches) > 1:
   born = matches[0]
   died = matches[1]
   print("Born:", born)
   print("Died:", died)