我一直致力于解析来自Ical Feed的事件信息的代码。这是一个巨大的数据块,我想按关键术语划分。我需要它以有序的方式完成。我尝试索引关键术语,然后让程序打印这些索引之间的内容。然而由于某种原因,它变成了无限循环,打印出所有数据。我不知道如何解决它。不要运行我的代码,这样可以冻结我的计算机。我希望有人能告诉我我的问题是什么。
请勿运行此程序
use strict;
use warnings;
use LWP::Simple;
use HTML::TreeBuilder;
use HTML::FormatText;
my $URL= get("https://www.events.utoronto.ca/iCal.php?ical=1&campus=0&
+sponsor%5B%5D=&audience%5B%5D=&category%5B%5D=");
my $Format=HTML::FormatText->new;
my $TreeBuilder=HTML::TreeBuilder->new;
$TreeBuilder->parse($URL);
my $Parsed=$Format->format($TreeBuilder);
open(FILE, ">UOTSUMMER.txt");
print FILE "$Parsed";
close (FILE);
open (FILE, "UOTSUMMER.txt");
my @array=<FILE>;
my $string ="@array";
my $offset = 0; # Where are we in the string?
my $numResults = 0;
while (1) {
my $idxSummary = index($string, "SUMMARY", $offset);
my $result = "";
my $idxDescription = index ($string, "DESCRIPTION", $offset);
my $result2= "";
if ($idxSummary > -1) {
$offset = $idxSummary + length("SUMMARY");
my $idxDescription = index($string, "DESCRIPTION", $offset);
if ($idxDescription == -1) {
print "(Data malformed: missing DESCRIPTION line.)\n";
last;
}
if ($idxDescription > -1) {
$offset = $idxDescription+ length("DESCRIPTION");
my $idxLocation= index($string, "LOCATION", $offset);
if ($idxLocation == -1) {
print "(Data malformed: missing LOCATION line.)\n";
last;
}
my $length = $idxDescription - $offset;
my $length2= $idxLocation - $offset;
$result = substr($string, $offset, $length);
$result2= substr ($string, $offset, $length2);
$offset = $idxDescription + length("DESCRIPTION");
$result =~ s/^\s+|\s+$//g ; # Strip leading and trailing white space, including newlines.
$result2 =~ s/^\s+|\s+$//g ;
$numResults++;
} else {
print "(All done. $numResults result(s) found.)\n";
last;
}
open (FILE2, "UOT123.txt")
print FILE2 "TITLE: <$result>\n DESCRIPTION: <$result2>\n";
您将获得任何指导,我们将不胜感激!谢谢!
答案 0 :(得分:0)
我被你的警告所启发,我不得不运行它。我甚至安装了所需的模块。你的计算机可能只是被无限循环陷入困境,而不是真的崩溃。
查看您的代码,问题几乎肯定是您的索引。就像现在一样,你的循环逻辑有点混乱。你最好的选择是重新思考你是如何做到这一点的。而不是使用所有这些逻辑,尝试使循环依赖于遍历文件。这样,制作无限循环将会困难得多。此外,正则表达式将使这项工作更加简单。这可能不是你想要的,但它是一个开始:
while ($string =~ m/SUMMARY(.+?)DESCRIPTION(.+?)(?=SUMMARY|$)/gcs)
{
print "summary is: \n\n $1 \n\n description is: \n\n $2 \n\n";
}
其他一些快速点:
答案 1 :(得分:0)
以下可能会帮助您完成解析任务:
use Modern::Perl;
use LWP::Simple qw/get/;
use HTML::Entities;
my $html = get 'https://www.events.utoronto.ca/iCal.php?ical=1&campus=0&+sponsor%5B%5D=&audience%5B%5D=&category%5B%5D=';
while ( $html =~ /(Summary:\s*[^\n]+)\s*(Description:\s*[^\n]+)/gi ) {
say decode_entities($1) . "\n" . decode_entities($2);
}
示例输出:
SUMMARY:Learning Disabilities Parent Support Group
DESCRIPTION: Dates: Thursdays: May 24, June 21, and July 19
SUMMARY:"Reading to Write"
DESCRIPTION: Leora Freedman, Coordinator, English Language Learning Program, Faculty of Arts & Science
SUMMARY:The Irish Home Rule Bill of 1912: A Centennial Symposium
DESCRIPTION: One-day symposium presented by the Celtic Studies Program, St. Michael's College
如果文本中的html实体正常,则可以省略HTML::Entities
和decode_entities($1)
表示法,否则您可能会得到如下结果:
DESCRIPTION: Leora Freedman, Coordinator, English Language Learning Program, Faculty of Arts & Science
希望这有帮助!