我有一个脚本,可以查找环境新闻文章中的特定单词。它可以处理一篇文章,然后再处理五篇然后(没有收到数据)但是,我必须循环浏览大约30个RSS源,每个包含10篇文章,每周一次。
是否有更强大的解决方案?或者某种方式让它处理几个然后重新启动自己?
my colleague suggested I explain what happens in the script.
the script loads RSS Feeds from a list. one by one.
it uses magpie_debug to obtain links, title, dates.
if the date is less than 60 minutes ago, (fresh article)
it pulls the plaintext (simple_DOM) attaches POS tags using brill tagger
splits text into sentences.
builds arrays of capitalized nouns, matches them twelve different word banks
including a large database of chemicals, companies etc. and generates an
algorithm of 'total environmental impact' for each sentence.
moves to next sentence in article until completed.
each article takes about 10 seconds to process.
Moves to the next article. Until all articles processed.
Moves to next feed until all feeds processed.
我可以抓住所有文章/提要的明文没有问题,但是一旦我投入处理,该功能就会急剧下降。大约四篇文章后我收到了无数据。
答案 0 :(得分:0)
我认为这是您根据问题标题寻找的内容:
<?php
//set the time limit to infinite
set_time_limit(0);
//do more stuff
?>