我的文本格式如下:
let text = "help me on monday, january 8 take the dog out";
基本上是一个可能包含<weekday-date>
组合的句子。
我想
对于1,我能够通过以下方式实现此目标
const weekdayRegex = /\b((mon|tues|wed(nes)?|thur(s)?|fri|sat(ur)?|sun)(day)?)\b/;
const monthThenDayRegex = /(jan(uary)?|feb(ruary)?|mar(ch)?|apr(il)?|may|jun(e)?|jul(y)?|aug(ust)?|sep(tember)?|oct(ober)?|nov(ember)?|dec(ember)?) *\d{1,2}(th|nd|st|rd){0,1}/;
const dayDayThenMonthRegex = /d{1,2}(th|nd|st|rd){0,1} *(of){0,1} *(jan(uary)?|feb(ruary)?|mar(ch)?|apr(il)?|may|jun(e)?|jul(y)?|aug(ust)?|sep(tember)?|oct(ober)?|nov(ember)?|dec(ember)?)/;
然后:
commaSeperated = new RegExp(weekdayRegex.source + " *,{0,1} *" + monthThenDayRegex.source);
commaSeperated.test(text)
这行得通,我是真的
对于步骤2,如何提取“ 1月8日,星期一”
对于第3步,如何保留“ 1月8日”?
答案 0 :(得分:0)
您可以将match用于这些目的
In [1]: from parsel import Selector
In [2]: sel = Selector(text="""<html>
...: <body>
...: <h1>Hello, Parsel!</h1>
...: <ul>
...: <li><a href="http://example.com">Link 1</a></li>
...: <li><a href="http://scrapy.org">Link 2</a></li>
...: </ul
...: </body>
...: </html>""")
In [3]: sel.css('ul li:nth-child(2) a::attr(href)').extract_first()
Out[3]: 'http://scrapy.org'
答案 1 :(得分:0)
我对您的commaSeparated
正则表达式做了一个简单的简化-{0,1}
与?
相同
let text = "help me on monday, january 8 take the dog out";
const weekdayRegex = /\b((mon|tues|wed(nes)?|thur(s)?|fri|sat(ur)?|sun)(day)?)\b/;
const monthThenDayRegex = /(jan(uary)?|feb(ruary)?|mar(ch)?|apr(il)?|may|jun(e)?|jul(y)?|aug(ust)?|sep(tember)?|oct(ober)?|nov(ember)?|dec(ember)?) *\d{1,2}(th|nd|st|rd){0,1}/;
const dayDayThenMonthRegex = /d{1,2}(th|nd|st|rd){0,1} *(of){0,1} *(jan(uary)?|feb(ruary)?|mar(ch)?|apr(il)?|may|jun(e)?|jul(y)?|aug(ust)?|sep(tember)?|oct(ober)?|nov(ember)?|dec(ember)?)/;
const commaSeperated = new RegExp(weekdayRegex.source + " *,? *" + monthThenDayRegex.source);
const m = text.match(commaSeperated);
if (m) {
console.log(m[0]);
console.log(m[0].replace(/^.+, */,''))
} else {
console.log('not a match');
}
答案 2 :(得分:0)
我通过将if (condition) {
for (1 : n) {
do stuff
}
}
和monthThenDayRegex
分成单独的dayDayThenMonthRegex
和monthRegex
正则表达式,然后在{{1 }}。我还添加了dayOfMonthRegex
之前和dateRegex
之后,以匹配字符串的开头和结尾,以允许使用String.replace
提取日期部分。如摘要所示,您还可以使用^.*?
来检查字符串是否与正则表达式匹配,然后再对其进行处理。
.*$