正则表达式:在html中查找不包含带有方括号的文本的链接,例如。 '[Some_Random_text]',但可以包含空方括号'[]'

时间:2019-11-27 06:10:29

标签: php regex regex-lookarounds regex-group

情况1:

示例html:<a href="https://www.jessussaveme.com/saveme/c-from.html?random[for_god_sake_save_me]=anyonethere&no=fr&lang=fr">Test</a>

预期输出:

https://www.jessussaveme.com/saveme/c-from.html?random[for_god_sake_save_me]=anyonethere&no=fr&lang=fr

情况2:

示例html:<a href="https://www.jessussaveme.com/saveme/c-from.html?random[]=anyonethere&no=fr&lang=fr">Test</a>

预期输出:无。链接不应包含空方括号[]

情况3:

示例html:<a href="https://www.jessussaveme.com/saveme/c-from.html?random=anyonethere&no=fr&lang=fr">Test</a>

预期输出:https://www.jessussaveme.com/saveme/c-from.html?random=anyonethere&no=fr&lang=fr

应选择哪个链接: 1.不包含任何链接的方括号“ []”                            OR 2.包含非空方括号“ [Some_random_text]”的链接

不应选择的链接: 包含空方括号[]的链接。

2 个答案:

答案 0 :(得分:0)

您可以使用jQuery而不是正则表达式:

v4.0.0
$("a").each(function(index) { // iterates all <a> elements
  console.log($(this).attr('href').includes('[]') ? '' : $(this).attr('href')); // check if contain "[]" or not.
});

除非您可以从<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script> <a href="https://www.jessussaveme.com/saveme/c-from.html?reges[for_god_sake_save_me]=anyonethere&no=fr&lang=fr">Test</a> <a href="https://www.jessussaveme.com/saveme/c-from.html?reges[]=anyonethere&no=fr&lang=fr">Test</a> <a href="https://www.jessussaveme.com/saveme/c-from.html?random=anyonethere&no=fr&lang=fr">Test</a> you shouldn't use regex to parse获取文本。


由于您已经说过使用PHP,因此可以尝试以下method来提取URL:

a href

并检查contain是否为空括号:

$html = '<a href="https://www.jessussaveme.com/saveme/c-from.html?reges[for_god_sake_save_me]=anyonethere&no=fr&lang=fr">Test</a>

    <a href="https://www.jessussaveme.com/saveme/c-from.html?reges[]=anyonethere&no=fr&lang=fr">Test</a>

    <a href="https://www.jessussaveme.com/saveme/c-from.html?random=anyonethere&no=fr&lang=fr">Test</a>';

$hrefs = array();

$dom = new DOMDocument();
$dom->loadHTML($html);

$tags = $dom->getElementsByTagName('a');
foreach ($tags as $tag) {
       $hrefs[] =  $tag->getAttribute('href');
}

答案 1 :(得分:0)

此作品有效:

if( !str.contains("reges[")){ //passed() -pick up tat link as string doesnt contain reges[] or reges [some text] }else{ //match with <\S.*?=\"(.*reges\[\w+\].*)\">.*> // if you find match then pickup that link from group 1 }

您可以看到它在这里工作。它仅与组1中的第一个标记匹配,而在第二种情况下,当[]为空时,则不返回任何内容。

https://regex101.com/r/cdvVnP/1

编辑:

对于第三种情况,它应该看起来像:

public override void Process(TagHelperContext context, TagHelperOutput output)
{
   output.TagName = "div";
   output.PreElement.SetHtmlContent(new HtmlString("/*<!--*/\n"));
   output.PreElement.SetHtmlContent(new HtmlString("\n/*-->*/"));
   output.Content.SetHtmlContent("HI");
}