Python正则表达式匹配撇号

时间:2014-09-23 08:43:58

标签: python regex

re_newspeaker =         r'^(<bullet> |  )(?P<name>(%s|(((Mr)|(Ms)|(Mrs))\. [-A-Za-z \']+( of [A-Z][a-z]+)?))|((The ((VICE|ACTING|Acting) )?(PRESIDENT|SPEAKER|CHAIR(MAN)?)( pro tempore)?)|(The PRESIDING OFFICER)|(The CLERK)|(The CHIEF JUSTICE)|(The VICE PRESIDENT)|(Mr\. Counsel [A-Z]+))( \([A-Za-z.\'\- ]+\))?)\.'


re_speaking =           r'^(<bullet> |  )((((((Mr)|(Ms)|(Mrs))\. [A-Za-z \'\-]+(of [A-Z][a-z]+)?)|((The (VICE |Acting |ACTING )?(PRESIDENT|SPEAKER)( pro tempore)?)|(The PRESIDING OFFICER)|(The CLERK))( \([A-Za-z.\'\- ]+\))?))\. )?(?P<start>.)'

出于某种原因,上面的正则表达式没有用撇号来表示名称。

例如:D'STALL先生 不匹配。任何有关正则表达式模式的帮助都将非常受欢迎。

代码所做的是获取输入并将其标记为XML。如下所示:

<speaker=Mr. D'STALL</speaker><speaking>Mr. President, I have been seeking to obtain a report on
this bill. I am not on the Budget Committee, and I am not on the
Government Relations Committee. But from what I understand, this is a
very important bill, a big bill, a complex bill, far reaching in its
contents. I have been queried, along with all other Senators, I
suppose, as to whether or not they would have any objection to the
adoption of the committee amendments, en bloc. I am going to object to
the adoption of the committee amendments, en bloc, until I see the
committee report.</speaking>

  Mr. D'STALL. Mr. President, I have been seeking to obtain a report on
this bill. I am not on the Budget Committee, and I am not on the
Government Relations Committee. But from what I understand, this is a
very important bill, a big bill, a complex bill, far reaching in its
contents. I have been queried, along with all other Senators, I
suppose, as to whether or not they would have any objection to the
adoption of the committee amendments, en bloc. I am going to object to
the adoption of the committee amendments, en bloc, until I see the
committee report.

正则表达式与上段不符。

1 个答案:

答案 0 :(得分:0)

re_newspeaker =         r'^(<bullet> |  )(?P<name>(%s|(((Mr)|(Ms)|(Mrs))\. [-A-Z\']+|((Miss) [-A-Z\']+)( of [A-Z][a-z]+)?))|((The ((VICE|ACTING|Acting) )?(PRESIDENT|SPEAKER|CHAIR(MAN)?)( pro tempore)?)|(The PRESIDING OFFICER)|(The CLERK)|(The CHIEF JUSTICE)|(The VICE PRESIDENT)|(Mr\. Counsel [A-Z]+))( \([A-Za-z.\- ]+\))?)\.'

re_speaking =           r'^(<bullet> |  )((((((Mr)|(Ms)|(Mrs))\. [A-Z\']+|((Miss) [-A-Z\']+)(of [A-Z][a-z]+)?)|((The (VICE |Acting |ACTING )?(PRESIDENT|SPEAKER)( pro tempore)?)|(The PRESIDING OFFICER)|(The CLERK))( \([A-Za-z.\- ]+\))?))\. )?(?P<start>.)'

以上RegEx解决了我的问题。如果其他人有这个问题,我想我会发布它!