Perl Regex解析第一个锚点<a> tag

时间:2019-01-08 12:21:29

标签: regex perl parsing

I have below HTML page having two anchor tags as input to my perl script:

<a href="link.html"> TITLE </a> <a href="link.html"> SUB TITLE </a>

I want to extract only title i.e. I need text only from first anchor tag <a> and second anchor tag <a> text should be ignored. And this I need to do using Perl Regex only.

I've tried below Regex, but its not working as expected:

<a[^>]*[^>]*>(?!.*a>.*)a>


The whole script goes like this:

#!/usr/bin/perl

use strict;
use warnings;

my $str = '<a href="link.html"> TITLE </a> <a href="link.html"> SUB TITLE </a>';

my $res = $str =~ m/<a[^>]*[^>]*>(?!.*a>.*)a>/;

print $res;

Execution:

 prakash@prak-pc:~$ perl regtest.pl 
 prakash@prak-pc:~$

2 个答案:

答案 0 :(得分:2)

/^(?:<a[^>]*>)([^<]*)/

请参阅regex101上的演示     https://regex101.com/r/Po3goc/1

  • ^在行的开头声明位置非捕获组 (?:] >)]
    • 数量词-尽可能在零到无限次之间进行匹配,并根据需要进行回馈(贪婪)
        

      匹配字符>按字面值(区分大小写)   匹配字符>字面值(区分大小写)第一捕获组([^ <] )匹配下面列表中不存在的单个字符   [^ <]

    • 数量词-尽可能在零和无限次数之间进行匹配,并根据需要返回(贪婪) 字符<从字面上(区分大小写)

答案 1 :(得分:2)

import { enableProdMode } from '@angular/core';
import { platformBrowserDynamic } from '@angular/platform-browser-dynamic';

import { AppModule } from './app/app.module';
import { environment } from './environments/environment';

if (environment.production) {
  enableProdMode();
  if(window){
  window.console.log = function(){};
 }
}
  document.addEventListener('DOMContentLoaded', () => {
   platformBrowserDynamic().bootstrapModule(AppModule)
    .catch(err => console.log(err));
 });

说明:

my $str = '<a href="link.html"> TITLE </a> <a href="link.html"> SUB TITLE </a>';
my ($res) = $str =~ m~<a[^>]*>(.*?)</a>~;
print $res,"\n";

如果您不希望前导空格和尾随空格匹配,则可以使用m~ # match operator, delimiter <a # literally <a [^>]* # 0 or more any character tat is not > > # > (.*?) # group 1, 1 or more any character, not greedy </a> # literally ~ # regex delimiter