Question

所以我试图解析一个XML文件并用READ MORE链接显示一篇文章的前150个单词。它虽然没有完全解析150个单词。我也不知道如何制作它所以它不解析IMG标签代码等...代码在下面

    // Script displays 3 most recent blog posts from blog.pinchit.com (blog..pinchit.com/api/read)
    // The entries on homepage show the first 150 words of description and "READ MORE" link

    // PART 1 - PARSING

    // if it was a JSON file
    //  $string=file_get_contents("http://blog.pinchit.com/api/read");
    //  $json_a=json_decode($string,true);
    //  var_export($json_a);


    // XML Parsing
    $file = "http://blog.pinchit.com/api/read";
    $posts_to_display = 3;
    $posts = array();

    // get all the file nodes
    if(!$xml=simplexml_load_file($file)){
        trigger_error('Error reading XML file',E_USER_ERROR);
    }

    // counter for posts member array
    $counter = 0;

    // Accessing elements within an XML document that contain characters not permitted under PHP's naming convention 
    // (e.g. the hyphen) can be accomplished  by encapsulating the element name within braces and the apostrophe.

    foreach($xml->posts->post as $post){

        //post's title
        $posts[$counter]['title'] = $post->{'regular-title'};

        // post's full body 
        $posts[$counter]['body'] = $post->{'regular-body'};

        // post's body's first 150 words 
        //for some reason, I am not sure if it's exactly 150 
        $posts[$counter]['preview'] = substr($posts[$counter]['body'], 0, 150);

        //strip all the html tags so it doesn't mess up the page
        $posts[$counter]['preview'] = strip_tags($posts[$counter]['preview']);


        //post's id
        $posts[$counter]['id'] = $post->attributes()->id;


        $posts_to_display--;
        $counter++;
        //exit the for loop after we parse out all the articles that we want
        if ($posts_to_display == 0 ) break;
    }

    // Displays all of the posts

    foreach($posts as $post){

        echo "<b>" . $post['title'] . "</b>";
        echo "<br/>";
        echo $post['preview'];
        echo " <a href='http://blog.pinchit.com/post/" . $post[id] . "'>Read More</a>";
        echo "<br/><br/>";

    }

现在看看结果如何。

编辑推荐：Club Sportiva   没有任何东西可以让你感觉完全自由和控制，就像一辆时尚，精致，性感的跑车一样。阅读更多
并不奇怪
Pinchy Drinks＆amp;岩石：犹他州沙龙酒店   犹他州酒店阅读更多

周一菜单：香辣葡萄柚，辣椒粉，Creamsicles   今天感觉夏日和咸味，我们不得不承认，它需要花费很多精力来抵制这种开胃菜，所有甜点或所有饮料的冲动阅读更多

Answer 1

HTML标记正在计算您的角色总数。首先剥去标签，然后拍摄预览样本：

$preview = strip_tags($posts[$counter]['body']);
$posts[$counter]['preview'] = substr($preview, 0, 150).'...';

此外，通常会在截断文本的末尾添加一个椭圆（“...”）以表示它会继续。

请注意，这可能会删除您想要的标记，例如<p>和<br>。如果要保留它们，可以将它们作为strip_tags的第二个参数传递：

$preview = strip_tags($posts[$counter]['body'], '<br><p>');
$posts[$counter]['preview'] = substr($preview, 0, 150).'...';

但是，要预先警告XML样式标签可能会将其关闭（<br />）。如果您正在处理XML / HTML混合，您可能需要使用类似htmLawed的内容来提升标记过滤，但概念保持不变 - 在截断之前删除HTML。

Answer 2

查看标记<regular-body>，它似乎包含HTML。因此，我建议尝试将其解析为DOMDocument（http://www.php.net/manual/en/domdocument.loadhtml.php）。然后，您就可以遍历所有项目并忽略某些标记（例如忽略<img>但保留<p>）。之后，您可以渲染出您想要的内容并将其截断为150个字符。

限制XML / HTML字符串长度

2 个答案: