re_newspeaker = r'^(<bullet> | )(?P<name>(%s|(((Mr)|(Ms)|(Mrs))\. [-A-Za-z \']+( of [A-Z][a-z]+)?))|((The ((VICE|ACTING|Acting) )?(PRESIDENT|SPEAKER|CHAIR(MAN)?)( pro tempore)?)|(The PRESIDING OFFICER)|(The CLERK)|(The CHIEF JUSTICE)|(The VICE PRESIDENT)|(Mr\. Counsel [A-Z]+))( \([A-Za-z.\'\- ]+\))?)\.'
re_speaking = r'^(<bullet> | )((((((Mr)|(Ms)|(Mrs))\. [A-Za-z \'\-]+(of [A-Z][a-z]+)?)|((The (VICE |Acting |ACTING )?(PRESIDENT|SPEAKER)( pro tempore)?)|(The PRESIDING OFFICER)|(The CLERK))( \([A-Za-z.\'\- ]+\))?))\. )?(?P<start>.)'
出于某种原因,上面的正则表达式没有用撇号来表示名称。
例如:D'STALL先生 不匹配。任何有关正则表达式模式的帮助都将非常受欢迎。
代码所做的是获取输入并将其标记为XML。如下所示:
<speaker=Mr. D'STALL</speaker><speaking>Mr. President, I have been seeking to obtain a report on
this bill. I am not on the Budget Committee, and I am not on the
Government Relations Committee. But from what I understand, this is a
very important bill, a big bill, a complex bill, far reaching in its
contents. I have been queried, along with all other Senators, I
suppose, as to whether or not they would have any objection to the
adoption of the committee amendments, en bloc. I am going to object to
the adoption of the committee amendments, en bloc, until I see the
committee report.</speaking>
Mr. D'STALL. Mr. President, I have been seeking to obtain a report on
this bill. I am not on the Budget Committee, and I am not on the
Government Relations Committee. But from what I understand, this is a
very important bill, a big bill, a complex bill, far reaching in its
contents. I have been queried, along with all other Senators, I
suppose, as to whether or not they would have any objection to the
adoption of the committee amendments, en bloc. I am going to object to
the adoption of the committee amendments, en bloc, until I see the
committee report.
正则表达式与上段不符。
答案 0 :(得分:0)
re_newspeaker = r'^(<bullet> | )(?P<name>(%s|(((Mr)|(Ms)|(Mrs))\. [-A-Z\']+|((Miss) [-A-Z\']+)( of [A-Z][a-z]+)?))|((The ((VICE|ACTING|Acting) )?(PRESIDENT|SPEAKER|CHAIR(MAN)?)( pro tempore)?)|(The PRESIDING OFFICER)|(The CLERK)|(The CHIEF JUSTICE)|(The VICE PRESIDENT)|(Mr\. Counsel [A-Z]+))( \([A-Za-z.\- ]+\))?)\.'
re_speaking = r'^(<bullet> | )((((((Mr)|(Ms)|(Mrs))\. [A-Z\']+|((Miss) [-A-Z\']+)(of [A-Z][a-z]+)?)|((The (VICE |Acting |ACTING )?(PRESIDENT|SPEAKER)( pro tempore)?)|(The PRESIDING OFFICER)|(The CLERK))( \([A-Za-z.\- ]+\))?))\. )?(?P<start>.)'
以上RegEx解决了我的问题。如果其他人有这个问题,我想我会发布它!