如何从正则表达式中提取匹配项?

时间:2018-12-13 21:28:43

标签: javascript regex

我的文本格式如下:

let text = "help me on monday, january 8 take the dog out";

基本上是一个可能包含<weekday-date>组合的句子。

我想

  1. 识别句子是否具有工作日-日期组合
  2. 提取工作日与日期的组合(因此,1月8日,星期一)
  3. 删除工作日名称(1月8日左右)

对于1,我能够通过以下方式实现此目标

const weekdayRegex = /\b((mon|tues|wed(nes)?|thur(s)?|fri|sat(ur)?|sun)(day)?)\b/;
const monthThenDayRegex = /(jan(uary)?|feb(ruary)?|mar(ch)?|apr(il)?|may|jun(e)?|jul(y)?|aug(ust)?|sep(tember)?|oct(ober)?|nov(ember)?|dec(ember)?) *\d{1,2}(th|nd|st|rd){0,1}/;
const dayDayThenMonthRegex = /d{1,2}(th|nd|st|rd){0,1} *(of){0,1} *(jan(uary)?|feb(ruary)?|mar(ch)?|apr(il)?|may|jun(e)?|jul(y)?|aug(ust)?|sep(tember)?|oct(ober)?|nov(ember)?|dec(ember)?)/;

然后:

commaSeperated = new RegExp(weekdayRegex.source + " *,{0,1} *" + monthThenDayRegex.source);
commaSeperated.test(text)

这行得通,我是真的

对于步骤2,如何提取“ 1月8日,星期一”

对于第3步,如何保留“ 1月8日”?

3 个答案:

答案 0 :(得分:0)

您可以将match用于这些目的

In [1]: from parsel import Selector

In [2]: sel = Selector(text="""<html>
   ...:         <body>
   ...:             <h1>Hello, Parsel!</h1>
   ...:             <ul>
   ...:                 <li><a href="http://example.com">Link 1</a></li>
   ...:                 <li><a href="http://scrapy.org">Link 2</a></li>
   ...:             </ul
   ...:         </body>
   ...:         </html>""")

In [3]: sel.css('ul li:nth-child(2) a::attr(href)').extract_first()
Out[3]: 'http://scrapy.org'

https://jsbin.com/repexux/1/edit?js,console

答案 1 :(得分:0)

我对您的commaSeparated正则表达式做了一个简单的简化-{0,1}?相同

let text = "help me on monday, january 8 take the dog out";
const weekdayRegex = /\b((mon|tues|wed(nes)?|thur(s)?|fri|sat(ur)?|sun)(day)?)\b/;
const monthThenDayRegex = /(jan(uary)?|feb(ruary)?|mar(ch)?|apr(il)?|may|jun(e)?|jul(y)?|aug(ust)?|sep(tember)?|oct(ober)?|nov(ember)?|dec(ember)?) *\d{1,2}(th|nd|st|rd){0,1}/;
const dayDayThenMonthRegex = /d{1,2}(th|nd|st|rd){0,1} *(of){0,1} *(jan(uary)?|feb(ruary)?|mar(ch)?|apr(il)?|may|jun(e)?|jul(y)?|aug(ust)?|sep(tember)?|oct(ober)?|nov(ember)?|dec(ember)?)/;

const commaSeperated = new RegExp(weekdayRegex.source + " *,? *" + monthThenDayRegex.source);

const m = text.match(commaSeperated);
if (m) {
  console.log(m[0]);
  console.log(m[0].replace(/^.+, */,''))
} else {
  console.log('not a match');
}

答案 2 :(得分:0)

我通过将if (condition) { for (1 : n) { do stuff } } monthThenDayRegex分成单独的dayDayThenMonthRegexmonthRegex正则表达式,然后在{{1 }}。我还添加了dayOfMonthRegex之前和dateRegex之后,以匹配字符串的开头和结尾,以允许使用String.replace提取日期部分。如摘要所示,您还可以使用^.*?来检查字符串是否与正则表达式匹配,然后再对其进行处理。

.*$