这只是我编写的第二个perl脚本,因此非常感谢任何建设性的帮助/建议。此外,请注意我使用Strawberry Perl在Windows机器上工作。我知道Perl存在一个Tidy模块,但是(由于本说明中没有值得解释的原因)更喜欢从脚本调用tidy.exe,而不是使用模块。
我希望我的perl脚本能够做什么:
获取一个html文件,复制它,然后给它一个.xml扩展名。
在新形成的.xml文件上运行tidy.exe,使其格式正确。
从新创建的格式良好的.xml文件中删除xhtml命名空间
当我使用以下命令G:\TestFolder>perl tidy_cleanup.pl
从命令行运行它时,它会产生所需的结果。但是,当我从图标中触发脚本时,它会跳过上面列出的第2步。根据下面发布的代码,你知道它为什么会这样吗?
这是我的代码:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
use FileHandle;
my $basename;
my @files = glob("*.html");
foreach my $file (@files) {
my $oldext = ".html";
my $newext = ".xml";
my $newerext = "v2.xml";
my $newfile = $file;
$newfile =~ s/$oldext/$newext/;
my $newerfile = $newfile;
$newerfile =~ s/$newext/$newerext/;
open IN, $file or die "Can't read source file $file: $\n";
open OUT, ">$newfile" or die "Can't write on file $newfile: $!\n";
print "Copying $file to $newfile\n";
{while(<IN>)
{
print OUT $_;
close(IN);
close(OUT);
}
my $xmltidy = "for \%i in ($newfile) do c:\\Tidy\\tidy.exe --output-xml yes --numeric-entities yes --doctype omit --quote-nbsp no -asxml -utf8 -numeric -m \"\%i\"";
system($xmltidy);
print "\nfinished running tidy \n\n";
}
{
open NEWIN, "$newfile" or die "Can't read source file $newfile: $!\n";
open NEWOUT, ">$newerfile" or die "Can't write on file $newerfile: $!\n";
print "Copying $newfile to $newerfile\n";
{
while (<NEWIN>) {
if ( /(\<html)( xmlns="http:\/\/www.w3.org\/1999\/xhtml" xml:lang="en-GB")(.*)/ ) {
print NEWOUT "<html$3";
}
else {
print NEWOUT $_;
}
}
close(NEWIN);
close(NEWOUT);
}
}
}
答案 0 :(得分:1)
您的程序没有通过快捷方式工作的原因可能是它在错误的目录中查找HTML文件。当您从命令行运行perl tidy_cleanup.pl
时,它会在您当前的工作目录中查找,但是当您设置快捷方式时,您需要在标记为Start in:
的字段中指定当前目录。
但是,正如我在评论中所说,当您从HTML复制到XML时,您只处理文件的一行,因为您关闭了while
循环内的文件句柄。
这就是我写出我想你想要的东西。
use strict;
use warnings;
use autodie;
use File::Copy 'copy';
my $tidy = 'C:\Tidy\tidy.exe';
die "'tidy.exe' not found" unless -f $tidy;
for my $html_file (glob '*.html') {
(my $xml_file = $html_file) =~ s/\.html\z/.xml/;
copy $html_file, $xml_file;
print qq{Tidying "$xml_file"\n};
qx{"$tidy" --output-xml yes --numeric-entities yes --doctype omit --quote-nbsp no -asxml -utf8 -numeric -m "$xml_file"};
print "Finished running tidy\n\n";
(my $v2_file = $xml_file) =~ s/\.xml\z/_v2.xml/;
open my $xml_fh, '<', $xml_file;
open my $v2_fh, '>', $v2_file;
print qq{Copying "$xml_file" to "$v2_file"\n};
while (<$xml_fh>) {
s/\s*xmlns="[^"]+"//;
s/\s*xml:lang="[^"]+"//;
print $v2_fh $_;
}
print "Copy complete\n\n";
}
答案 1 :(得分:0)
use strict;
use warnings;
use File::Basename;
use FileHandle;
my @files = glob("*.html");
foreach my $file (@files) {
my $oldext = ".html";
my $newext = ".xml";
my $newerext = "v2.xml";
my $newfile = $file;
$newfile =~ s/$oldext/$newext/;
my $newerfile = $newfile;
$newerfile =~ s/$newext/$newerext/;
open IN, $file or die "Can't read source file $file: $\n";
open OUT, ">$newfile" or die "Can't write on file $newfile: $!\n";
print "Copying $file to $newfile\n";
{while(<IN>)
{
print OUT $_;
close(OUT);
my $xmltidy = "c:\\Tidy\\tidy.exe --output-xml yes --numeric-entities yes --doctype omit --quote-nbsp no -asxml -utf8 -numeric -m \"$newfile\"";
system($xmltidy);
print "\nfinished running tidy \n\n";
{
open NEWIN, "$newfile" or die "Can't read source file $newfile: $!\n";
open NEWOUT, ">$newerfile" or die "Can't write on file $newerfile: $!\n";
print "Copying $newfile to $newerfile\n";
{while(<NEWIN>)
{
if(/(\<html)( xmlns="http:\/\/www.w3.org\/1999\/xhtml" xml:lang="en-GB")(.*)/) {
print NEWOUT "<html$3";
}
else {
print NEWOUT $_;
}
}
close(NEWIN);
close(NEWOUT);
}
}
}
close(IN);
}
}