Question

我正在开发一个小型的perl程序，该程序将打开一个站点并搜索Hail Reports这些词语并将其提供给我。我是perl的新手，所以其中一些可能很容易解决。首先我的代码说我正在使用一个单位化的值。这就是我所拥有的

#!/usr/bin/perl -w
use LWP::Simple;

my $html = get("http://www.spc.noaa.gov/climo/reports/last3hours.html")
    or die "Could not fetch NWS page.";
$html =~ m{Hail Reports} || die;
my $hail = $1;
print "$hail\n";

其次，我认为正则表达式是最简单的方法来做我想要的，但我不确定我是否可以用它们来做。我希望我的程序能够搜索Hail Reports这些词，并将Hails报告和Wind Reports之间的信息发回给我。这可以使用常规表达式，还是应该使用不同的方法？以下是我希望它发回的网页源代码片段

     <tr><th colspan="8">Hail Reports (<a href="last3hours_hail.csv">CSV</a>)&nbsp;(<a href="last3hours_raw_hail.csv">Raw Hail CSV</a>)(<a href="/faq/#6.10">?</a>)</th></tr> 

#The Data here will change throughout the day so normally there will be more info.
      <tr><td colspan="8" class="highlight" align="center">No reports received</td></tr> 
      <tr><th colspan="8">Wind Reports (<a href="last3hours_wind.csv">CSV</a>)&nbsp;(<a href="last3hours_raw_wind.csv">Raw Wind CSV</a>)(<a href="/faq/#6.10">?</a>)</th></tr>

Answer 1

未初始化的价值警告来自$ 1 - 它未在任何地方定义或设置。

对于行级而不是字节级“之间”，您可以使用：

for (split(/\n/, $html)) {
    print if (/Hail Reports/ .. /Wind Reports/ and !/(?:Hail|Wind) Reports/);
}

Answer 2

使用单线和多线匹配。另外，它只会选择文本之间的第一个匹配，这比贪婪要快一些。

#!/usr/bin/perl -w

use strict;
use LWP::Simple;

   sub main{
      my $html = get("http://www.spc.noaa.gov/climo/reports/last3hours.html")
                 or die "Could not fetch NWS page.";

      # match single and multiple lines + not greedy
      my ($hail, $between, $wind) = $html =~ m/(Hail Reports)(.*?)(Wind Reports)/sm
                 or die "No Hail/Wind Reports";

      print qq{
               Hail:         $hail
               Wind:         $wind
               Between Text: $between
            };
   }

   main();

Answer 3

你在$ 1中没有捕获任何东西，因为你的正则表达式都没有括在括号中。以下适用于我。

#!/usr/bin/perl
use strict;
use warnings;

use LWP::Simple;

my $html = get("http://www.spc.noaa.gov/climo/reports/last3hours.html")
    or die "Could not fetch NWS page.";

$html =~ m{Hail Reports(.*)Wind Reports}s || die; #Parentheses indicate capture group
my $hail = $1; # $1 contains whatever matched in the (.*) part of above regex
print "$hail\n";

Answer 4

正则表达式中的括号捕获字符串。您的正则表达式中没有括号，因此$ 1未设置为任何值。如果你有：

$html =~ m{(Hail Reports)} || die;

如果$ html变量中存在$ 1，那么$ 1将被设置为“Hail Reports”。既然你只想知道它是否匹配，那么你真的不需要捕捉任何东西，你可以写下这样的东西：

unless ( $html =~ /Hail Reports/ ) {
  die "No Hail Reports in HTML";
}

要捕捉字符串之间的某些内容，您可以执行以下操作：

if ( $html =~ /(?<=Hail Reports)(.*?)(?=Wind Reports)/s ) {
  print "Got $1\n";
}

对perl HTML解析有一点帮助

4 个答案: