Question

I have below HTML page having two anchor tags as input to my perl script:

<a href="link.html"> TITLE </a> <a href="link.html"> SUB TITLE </a>

I want to extract only title i.e. I need text only from first anchor tag <a> and second anchor tag <a> text should be ignored. And this I need to do using Perl Regex only.

I've tried below Regex, but its not working as expected:

<a[^>]*[^>]*>(?!.*a>.*)a>

The whole script goes like this:

#!/usr/bin/perl

use strict;
use warnings;

my $str = '<a href="link.html"> TITLE </a> <a href="link.html"> SUB TITLE </a>';

my $res = $str =~ m/<a[^>]*[^>]*>(?!.*a>.*)a>/;

print $res;

Execution:

 prakash@prak-pc:~$ perl regtest.pl 
 prakash@prak-pc:~$

Answer 1

/^(?:<a[^>]*>)([^<]*)/

请参阅regex101上的演示 https://regex101.com/r/Po3goc/1

^在行的开头声明位置非捕获组（？：] >）]
- 数量词-尽可能在零到无限次之间进行匹配，并根据需要进行回馈（贪婪）
  
  匹配字符>按字面值（区分大小写）匹配字符>字面值（区分大小写）第一捕获组（[^ <] ）匹配下面列表中不存在的单个字符 [^ <]
- 数量词-尽可能在零和无限次数之间进行匹配，并根据需要返回（贪婪）字符<从字面上（区分大小写）

Answer 2

import { enableProdMode } from '@angular/core';
import { platformBrowserDynamic } from '@angular/platform-browser-dynamic';

import { AppModule } from './app/app.module';
import { environment } from './environments/environment';

if (environment.production) {
  enableProdMode();
  if(window){
  window.console.log = function(){};
 }
}
  document.addEventListener('DOMContentLoaded', () => {
   platformBrowserDynamic().bootstrapModule(AppModule)
    .catch(err => console.log(err));
 });

说明：

my $str = '<a href="link.html"> TITLE </a> <a href="link.html"> SUB TITLE </a>';
my ($res) = $str =~ m~<a[^>]*>(.*?)</a>~;
print $res,"\n";

如果您不希望前导空格和尾随空格匹配，则可以使用m~ # match operator, delimiter <a # literally <a [^>]* # 0 or more any character tat is not > > # > (.*?) # group 1, 1 or more any character, not greedy </a> # literally ~ # regex delimiter。

Perl Regex解析第一个锚点<a> tag

2 个答案: