我正在寻找一种方法来确定Perl中网页的代码与文本的比率。不寻找任何复杂的只是简单的打印出来像HTML代码:75%文本:25%只是为了SEO原因。
答案 0 :(得分:4)
使用HTML :: TreeBuilder删除文本。
#!/usr/bin/perl
use strict;
use warnings;
use v5.10;
use LWP::Simple;
use HTML::TreeBuilder;
my $content = get(shift @ARGV);
die "Couldn't get it!" unless defined $content;
my $text = HTML::TreeBuilder->new_from_content($content)->as_text;
my $html_size = length $content;
my $text_size = length $text;
my $percentage = 100 * ( $text_size / $html_size );
say qq[$percentage%];
答案 1 :(得分:-2)
my $htmllength = 0;
my $textlength = 0;
while(<>) {
s/(<[^>]*>)/$htmllength += length($1); "";/eg;
$textlength += length($_);
}
print "HTML Code: " . (100 * $htmllength / ($htmllength + $textlength)) . "\n";
print "Text : " . (100 * $textlength / ($htmllength + $textlength)) . "\n";
然后,您只需在有问题的文件上运行脚本:
perl SCRIPT file1.html file2.html
注意:如果您的数据包含任何CDATA字段
,则无法使用此功能