Question

大家好，我有一个充满html文件的文件夹，我想将其转换为文本文件。我在ubuntu平台上工作，不幸的是lynx --dump没有为我安装。有没有其他方法将html文件转换为文本文件？请帮忙！提前谢谢。

Answer 1

此问题已标记为python，因此我的第一选择是Aaron Swartz的html2text。它以降价格式生成测试。

使用BeautifulSoup也可以使用Python解决方案。

如果你喜欢perl，这里有一个简单的perl脚本，用于将html转换为文字：

#!/usr/bin/perl -w

use HTML::Parse;
use HTML::FormatText;

my $file = $ARGV[0];
if (not -r $file) {
    die "ERROR: File ($file) is not readable\n";
}

my $html = do { local $/; open(I,$file); <I> };
my $plain = HTML::FormatText->new->format(parse_html($html) );
print $plain;

如何将多个html文件转换为文本文件？

1 个答案: