Question

目前我有这个：

<?php
$stran = file_get_contents("http://meteo.arso.gov.si/uploads/probase/www/fproduct/text/sl/fcast_si_text.html");
$stran = str_replace("<h2>","\n",$stran);
$stran = str_replace("</h2>","\n",$stran);
$stran = str_replace("<h1>","\n",$stran);
$stran = str_replace("</h1>","\n",$stran);
$stran = strip_tags($stran);

echo $stran;
?>

现在这给了我一些空行。我还想删除“Vir：DržavnainteorološkususžbaRS（meteo.si - ARSO）”之后的每一个文本，包括此字符串前的空行。

我尝试了一些正则表达式，但全部删除了所有文本。我很热吗？

Answer 1

可以使用正则表达式完成。

// Convert h1/h2 opening/closing tags to new line, ignore case
$stran = preg_replace('/<\/?h[12]>/i', "\n", $stran);

$stran = strip_tags($stran);

// Remove all leading whitespace
$stran = preg_replace('/^\s+/', '', $stran);

// Remove everything after "Vir: ..."
$stran = preg_replace('/(?<=Vir: Državna meteorološka služba RS \(meteo.si - ARSO\)).*/s', '', $stran);

一般来说，我建议真正解析html来提取信息。看看.DataBodyRange

PHP从网页编辑文本

1 个答案: