我使用file_get_content
解析网页数据。现在我想取出前150个字符作为该网址的描述。
$url = 'http://crewow.com/CSS_Layout_Tutorial.php';
$data = file_get_contents($url);
$content = plaintext($data);
$Preview = trim_display(140,$content); //to Show first 100 char of the web page as preview
echo $Preview;
function trim_display($size,$string)
{
echo "string is : $string <br/>";
$trim_string = substr($string, 0, 150);
$trim_string = $trim_string . "...";
echo "Trim string is $trim_string <br/>";
return $trim_string;
}
function plaintext($html)
{
$plaintext = preg_replace('#([<]title)(.*)([<]/title[>])#s', ' ', $html);
// remove title
//$plaintext = preg_match('#<title>(.*?)</title>#', $html);
// remove comments and any content found in the the comment area (strip_tags only removes the actual tags).
$plaintext = preg_replace('#<!--.*?-->#s', '', $plaintext);
// put a space between list items (strip_tags just removes the tags).
$plaintext = preg_replace('#</li>#', ' </li>', $plaintext);
// remove all script and style tags
$plaintext = preg_replace('#<(script|style)\b[^>]*>(.*?)</(script|style)>#is', "", $plaintext);
// remove br tags (missed by strip_tags)
// remove all remaining html
$plaintext = strip_tags($plaintext);
return $plaintext;
}
此代码适用于某些网址。很少有人在$ Preview中没有显示任何内容。
数据已正确发送至trim_display()
但未通过$trim_string = substr($string, 0, 150);
。
此remail的输出为空。
答案 0 :(得分:2)
实际上用户代码是正确的,并且工作也正确。但不幸的是,没有返回任何150个字符的角色。试试5000。
$trim_string = substr($string, 0, 5000);
要了解此问题,请参阅查看源。
您可以使用此代码而不是您的代码,并且肯定会起作用:
$url = 'http://crewow.com/CSS_Layout_Tutorial.php';
$data = file_get_contents($url);
$content = plaintext($data);
//echo trim($content);
$Preview = trim_display(150,trim($content)); //to Show first 100 char of the web page as preview
echo $Preview;
function trim_display($size,$string)
{
//echo "string is : $string <br/>";
$trim_string = substr($string, 0, 150);
$trim_string = $trim_string . "...";
//echo "Trim string is $trim_string <br/>";
return $trim_string;
}
function plaintext($html)
{
$plaintext = preg_replace('#([<]title)(.*)([<]/title[>])#s', ' ', $html);
// remove title
//$plaintext = preg_match('#<title>(.*?)</title>#', $html);
// remove comments and any content found in the the comment area (strip_tags only removes the actual tags).
$plaintext = preg_replace('#<!--.*?-->#s', '', $plaintext);
// put a space between list items (strip_tags just removes the tags).
$plaintext = preg_replace('#</li>#', ' </li>', $plaintext);
// remove all script and style tags
$plaintext = preg_replace('#<(script|style)\b[^>]*>(.*?)</(script|style)>#is', "", $plaintext);
// remove br tags (missed by strip_tags)
// remove all remaining html
$plaintext = strip_tags($plaintext);
return $plaintext;
}