Question

我正在尝试解析Web索引程序的HTML文档。要做到这一点，我正在使用HTML::TokeParser。

我在第一个if语句的最后一行收到错误：

 if ( $token->[1] eq 'a' ) {
     #href attribute of tag A
     my $suffix = $token->[2]{href};

表示Can't use string ("<./a>") as a HASH ref while "strict refs" in use at ./indexer.pl line 270, <PAGE_DIR> line 1.

我的问题是（后缀？或<./a>？）是一个字符串，是否需要转换为哈希引用？我查看了其他有类似错误的帖子..但我仍然不能确定这一点。谢谢你的帮助。

sub parse_document {

    #passed from input
    my $html_filename = $_[0];

    #base url for links
    my $base_url = $_[1];

    #created to hold tokens
    my @tokens = ();

    #created for doc links
    my @links = ();

    #creates parser
    my $p = HTML::TokeParser->new($html_filename);

    #loops through doc tags
    while (my $token = $p->get_token()) {
        #code for retrieving links
        if ( $token->[1] eq 'a' ) {
            # href attribute of tag A
           my $suffix = $token->[2]{href};

            #if href exists & isn't an email link
            if ( defined($suffix) && !($suffix =~ "^mailto:") ) {
                #make the url absolute
                my $new_url = make_absolute_url $base_url, $suffix;

                #make sure it's of the http:// scheme
                if ($new_url =~ "^http://"){
                    #normalize the url
                    my $new_normalized_url = normalize_url $new_url;

                    #add it to links array
                    push(@links, $new_normalized_url);
                }
            }
        }

        #code for text words
        if ($token->[0] eq 'T') {
            my $text =  $token->[1];

            #add words to end of array
            #(split by non-letter chars)
            my @words = split(/\P{L}+/, $text);
        }
    }

    return (\@tokens, \@links);
}

Answer 1

get_token()方法返回一个数组，其中$token->[2]是包含href的哈希引用，仅当$token->[0]是S（即开始标记）时才包含$token->[0]。在这种情况下，您匹配结束标记（其中next if $token->[0] ne 'S';是E）。有关详细信息，请参阅PerlDoc。

要修复，请添加

{{1}}

在你的循环顶部。

Answer 2

$token->[2]是一个字符串，而不是哈希引用。

执行print $token->[2]，您会看到它是一个包含</a>的字符串

Answer 3

显然$token->[2]正在解析为值为"</a>"的哈希引用。当然不想要你！

不能使用字符串作为哈希引用..？

3 个答案: