我已经完成了一个正则表达式来匹配下一个模式后的网址:
part1-part2-part3.html
,其中
第1部分:是一个常用字
part2:是一个带下划线的字母数字,至少包含2个字母
第3部分:是一个数字,有1到10位
例如,有效网址为:
news-my_news_title_200_is-12345.html
所以
part1 = news
part2 = my_news_title_200_is
part3 = 12345
我来到这里:
/^[a-z]+-([a-z0-9_]*(?=[a-z]{2,})[a-z0-9_]*).-([0-9]{1,10})\.html$/
用课程表达:
/^\w+-([\w\d_]*(?=\w{2,})[\w\d_]*).-(\d{1,10})\.html$/
但我想有更好的方式来表达R.E.的第2部分。图案。
提前致谢。
答案 0 :(得分:1)
试试这个
\b(?is)[a-z]+-\w*(?=[a-z]{2,})\w*-[0-9]{1,10}\.html\b
或
^(?is)[a-z]+-\w*(?=[a-z]{2,})\w*-[0-9]{1,10}\.html$
播放here
答案 1 :(得分:1)
试试这个:
\b[a-zA-Z]+-\w{2,}-\d{1,10}\.html\b
更强(避免part2仅匹配数字):
\b[a-zA-Z]+-(?!\d+-)\w{2,}-\d{1,10}\.html\b
答案 2 :(得分:1)
为了匹配2个非连续字母,你可以这样做(例如在perl中给出,但它适用于理解PCRE的语言)
use strict;
use warnings;
use 5.010;
use YAPE::Regex::Explain;
my $re = qr/^\w+-(\w*?[a-z]+\w*?[a-z]+\w*?)-\d+\.html$/;
say YAPE::Regex::Explain->new( $re )->explain;
while(<DATA>) {
chomp;
say ($_ =~ $re ? "match : $_" : "not match : $_");
}
__DATA__
news-my_news_title_200_is-12345.html
news-m_200_s-12345.html
news-m_200-12345.html
<强>输出:强>
match : news-my_news_title_200_is-12345.html
match : news-m_200_s-12345.html
not match : news-m_200-12345.html
正则表达式的解释:
(?-imsx:^\w+-(\w*?[a-z]+\w*?[a-z]+\w*?)-\d+\.html$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
- '-'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\w*? word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the least amount
possible))
----------------------------------------------------------------------
[a-z]+ any character of: 'a' to 'z' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
\w*? word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the least amount
possible))
----------------------------------------------------------------------
[a-z]+ any character of: 'a' to 'z' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
\w*? word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the least amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
- '-'
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
html 'html'
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------