我试图剥离带有以0或1开头的字母间距的span标签。
'<span style="letter-spacing:0.50 px">Boulevard,</span> '
to equal
'Boulevard, '
谢谢
这是一个完整系列的例子。
<span style="letter-spacing:1.33 px">PRODUCTS</span> <span style="letter-spacing:1.37 px">MODEL</span> <span style="letter-spacing:0.77 px">HPI-27C</span> <span style="letter-spacing:1.39 px">MODDED)</span> ; <span style="letter-spacing:1.12 px">(HIGHWAY</span> <span style="letter-spacing:1.33 px">PRODUCTS</span> <span style="letter-spacing:1.37 px">MODEL</span>
需要最终像
产品型号HPI-27C MODDED); (公路产品型号
答案 0 :(得分:1)
以下是使用Perl和HTML::Parser
的示例:
use strict;
use warnings;
use HTML::Parser ();
my $delete_tag = 0;
my $p = HTML::Parser->new(
api_version => 3,
default_h => [sub { print shift }, 'text'],
start_h => [\&start_handler, 'tagname,text,attr'],
end_h => [\&end_handler, 'tagname,text'],
);
my $str = do { local $/; <DATA> };
$p->parse($str) || die $!;
print "\n";
sub end_handler {
my ( $tag, $text ) = @_;
if ( $tag eq "span" ) {
if ($delete_tag) {
$delete_tag = 0;
return;
}
}
print $text;
}
sub start_handler {
my ( $tag, $text, $attr ) = @_;
if ( $tag eq "span" ) {
if ($attr->{style} =~ /letter-spacing:[01]\./) {
$delete_tag = 1;
return;
}
}
print $text;
}
__DATA__
<span style="letter-spacing:1.33 px">PRODUCTS</span> <span style="letter-spacing:1.37 px">MODEL</span> <span style="letter-spacing:0.77 px">HPI-27C</span> <span style="letter-spacing:1.39 px">MODDED)</span> ; <span style="letter-spacing:1.12 px">(HIGHWAY</span> <span style="letter-spacing:1.33 px">PRODUCTS</span> <span style="letter-spacing:1.37 px">MODEL</span>
<强>输出强>:
PRODUCTS MODEL HPI-27C MODDED) ; (HIGHWAY PRODUCTS MODEL
答案 1 :(得分:0)
Perl oneliners:
1。)使用Mojo::DOM58模块
perl -0777 -MMojo::DOM58 -E '$d=Mojo::DOM58->new(<>);$d->find("span")->grep(qr/letter-spacing:[01]/)->map(sub{$_->strip});print "$d"' <file.html
2。)或者,如果您安装了Mojolicious,则可以将ojo模块用作:
perl -Mojo -E '$d=x(f("file.html")->slurp);$d->find("span")->grep(qr/letter-spacing:[01]/)->map(sub{$_->strip});print "$d"'
两个例子都打印出来:
PRODUCTS MODEL HPI-27C MODDED) ; (HIGHWAY PRODUCTS MODEL
答案 2 :(得分:-1)
如果您发布了1个样本行,那么您的要求并不完整:
.js
以上内容适用于支持"rules": {
"react/jsx-filename-extension": [1, { "extensions": [".js", ".jsx"] }],
}
ERE的任何sed,例如: GNU sed和OSX sed。
鉴于您更新的样本输入/输出,这将使用GNU awk实现多字符RS和RT的所需:
$ sed -E 's#<span[^>]+letter-spacing:[01][^>]+>(.*)</span>#\1#' file
'Boulevard, '