如何使用JavaScript RegExp捕获特定的组?

时间:2018-06-24 04:30:00

标签: javascript regex

给出从PDF中提取的示例文本:

Professional Learning - August 31  Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24 - January 4 Martin Luther King, Jr. Day - January 21 Presidents’ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19

我的目标是捕获所有月份和日期,即它应该捕获以下所有内容:

  • August 31
  • October 19
  • March 18-22
  • December 24 - January 4
  • December 24-January 4

困难的部分是捕获月份不同的范围。我想出了这个RegExp:

/(January|February|March|April|May|August|September|October|November|December)\\s([0-9]*-?[0-9]+)(\s*-\s*(January|February|March|April|May|August|September|October|November|December)\\s([0-9]*-?[0-9]+))?/g

除了上面列出的最后两个示例外,它对所有其他程序都有效。在regexr上,它显示它在捕获组#3中捕获的很好,但是我无法在JavaScript中访问它。以以下代码段为例:

const string = 'Professional Learning - August 31  Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24 - January 4 Martin Luther King, Jr. Day - January 21 Presidents’ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19';

const subRegex = '(January|February|March|April|May|August|September|October|November|December)\\s([0-9]*-?[0-9]+)';
const dateRegex = new RegExp(`${subRegex}(\s*-\s*${subRegex})?`, 'g');

console.log(string.match(dateRegex));

似乎我可以分别捕获December 24January 4,但不能同时捕获。有什么办法可以将它们捕获在一起吗?

1 个答案:

答案 0 :(得分:1)

您只需要稍微调整(也许简化)您的原始RE:

const str = 'Professional Learning - August 31  Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24 - January 4 Martin Luther King, Jr. Day - January 21 Presidents’ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19';
// str2 has "December 24-January 4" instead - no spaces
const str2 = 'Professional Learning - August 31  Labor Day - September 3 Intersession- October 19 Professional Learning - October 22 Thanksgiving Break - November 21-23 Winter Break - December 24-January 4 Martin Luther King, Jr. Day - January 21 Presidents’ Day - February 18 Spring Break - March 18-22 Teacher Comp Day - April 19';
const re = /(January|February|March|April|May|August|September|October|November|December) [\d-]+([ -]*(January|February|March|April|May|August|September|October|November|December) \d+)?/g;
console.log(str.match(re));
console.log(str2.match(re));