stristr不使用html标签

时间:2012-09-28 17:57:46

标签: php html string dom html-parsing

我正在创建一个RSS提要聚合器,通过访问每个链接,不仅检索描述,还检索帖子的整个内容。我正在使用stristr过滤不需要的信息,如facebook,twitter粉丝和其他内容.It的作品完美的一个饲料,不适用于其他。这是我的代码:

<?php
function getcontent($l,$b,$c)
{
    $dom=file_get_html($l);
    $atitle=$dom->find($b);
    $content=$dom->find($c);
    $contents=implode(" ",$content);
foreach($atitle as $t)
            {
                echo "<b>".$t."</b>";

            }
            echo "<br /><br />";
        echo $contents;
        echo "<br />";
}
function filtercontent($strip,$l,$b,$c)
{
    $dom=file_get_html($l);
    $atitle=$dom->find($b);
    $content=$dom->find($c);
    $contents=implode(" ",$content);
    $contents=stristr($contents,$strip,true);
    foreach($atitle as $t)
            {
                echo "<b>".$t."</b>";

            }
            echo "<br />";
            echo $contents;
            echo "<br /><br />";

}
ini_set('default_charset', 'UTF-8');
ini_set('max_execution_time',0);
ini_set('memory_limit', -1);
include("simple_html_dom.php");

$url=array("http://www.deccanherald.com/rss/news.rss","http://syndication.indianexpress.com/rss/798/latest-news.xml");

$atitle=NULL;
$content=NULL;
foreach($url as $feed)
{
    $f=$feed;
    $feed=simplexml_load_file($feed);
    //echo $feed;
    if($feed)
    {
        //$feed_title=$feed->channel->title;
        //echo "<br />".$feed_title."<br />";
        $items=$feed->channel->item;
        foreach($items as $item)
        {
            //foreach($keywords as $key)
            //{
            //if(strtolower($item->description)==$key || strtolower($item->title)==$key)
            //{

        $title=$item->title;
        //echo "<h1><b>".$title."</b></h1><br />";
        $link=$item->link;
        //echo "<a href='".$link."'>".$link."</a><br />";
        $des=$item->description;
        //echo "<br />".$des."<br />";


            if($f=="http://beta.thehindu.com/news/?service=rss")
            {
            $title_class=".detail-title";
            $content_class=".body";
            getcontent($link,$title_class,$content_class);

            }
            if($f=="http://in.news.yahoo.com/rss/national/")
            {
            $title_class=".headline";
            $content_class=".yom-art-content";
            getcontent($link,$title_class,$content_class);
            }


        if($f=="http://syndication.indianexpress.com/rss/798/latest-news.xml")
            {

            $link=$link."0";
            $title_class=".headstory";
            $content_class=".contentLeftbigstory";
            $strip='<div class="paginationNew">';
            filtercontent($strip,$link,$title_class,$content_class);

            }
            if($f=="http://www.indiatvnews.com/rssfeed/india_news.xml")
            {

            $title_class=".topstorytitsub";
            $content_class=".standard";
            foreach($link as $post)
            {
                $dom=file_get_html($link);
                $title=$dom->find($title_class);
                $content=$dom->find('div[style=min-height:350px]');
                foreach($title as $t)
                echo "<b>".$t."</b><br />";
                foreach($content as $c)
                {
                    echo $c;

                }

            }


            }
            if($f=="http://beta.thehindu.com/news/?service=rss")
            {
            $title_class=".detail-title";
            $content_class=".body";
            getcontent($link,$title_class,$content_class);

            }
            if($f=="http://www.deccanherald.com/rss/news.rss")
            {
            $title_class=".newsText";
            $content_class=".postedBy";
            $strip='<a href="#top" class="gototop">Go to Top</a>';
            filtercontent($strip,$link,$title_class,$content_class);            
            }


            }
    }
        }


?> 

我使用simple html dom parser来解析html。filtercontent函数将一段字符串作为输入而不是其他输入。这个名为strip的字符串用于在第一次出现strip字符串之前过滤并返回所有内容。它与syndication.com提供完美的功能但是却没有使用deccanherald.com提要。我已经排除了其他提要以便于理解,还有其他人使用getcontent功能正常工作。在deccan先驱中,帖子的样本来源是:< / p>

<h1>Crazy star Ravichandran takes potshots at TV channels</h1>

                                                            <div class="postedBy">Mysore, September 28, 2012, DHNS:
                                                                                            <p>Actor opens ‘Conflux 2012’ media fest at Mahajana’s college in city</p>
                                                        <a name="top"></a>

                                                        <p><p><strong>When actor, director and producer of Kannada filmdom V&#8200;Ravichandran was invited to inaugurate &lsquo;Conflux 2012&rsquo; a two-day inter-collegiate media and communication fest of&#8200;SBRR&#8200;Mahajana First&#8200;Grade College in the city on Friday, many would have thought it contrasting.</strong><br /><br />However, when Ravi as he is popular among his acolytes, took over the dais and addressed the gathering where youngsters topped others, the choice of selecting Ravichandran to open the fest seemed apt. <br /><br />Mincing no words, the actor nick named &lsquo;Crazy Star&rsquo; made a relevant remark taking potshots at the electronic media for opting negativism rather than positive aspects to up their television rating points (TRP). Taking the names of two channels in Kannada, the actor said they are indulging in taking the people for a ride with concocted facts.<br /><br /> More than that, almost all the channels are airing moribund programmes. Said&#8200;Ravichandran; &ldquo; Pen is mightier than sword and show your talent in reaching the people and guide them.&rdquo;<br /><br />On filmdom, Ravichandran said that the fans still want him to romance heroines like what he did in Premaloka and other flicks. &ldquo;&#8200;I have already turned 50&rdquo;, said&#8200;Ravichandran making it clear that he cannot redo what he did in the past.&#8200;Referring to &lsquo;Manjina Hani&rsquo; the most awaited movie from his banner from the past several years, the actor said &lsquo;he is discovering the man in him&rsquo;.  <br /><br />Earlier, it was a filmy welcome to the actor. No sooner he entered the hall, pat filled the air an all time hit song from Ranadheera; baa baaro ranadheera...  <br /><br />Principal of the college&#8200;Prof K&#8200;V&#8200;Prabhakar said students from as many as 18 colleges from several parts of the State are participating in the fest.</p><p>To avoid chaos, the management had prohibited the entry of outsiders (especially students). <br /><br />Barring the participants, dignitaries and media, others were not allowed with students of the college keeping a tab on the visitors at the main gate of Vivekananda Hall of the college.<br /><br />Jayalakshmipuram police had to disperse the mad crowd who had dared to assemble in front of the hall.<br /><br />Chairman of&#8200;Mahajana Education Society R&#8200;Vasudevamurthy, HoD, mass communication and journalism Nivedita and others were present.<br /><br /><strong>Supports Cauvery stir</strong><br /><br />Actor&#8200;Ravichandran on&#8200;Friday extended support to ongoing agitation against the centre&rsquo;s directive to State to release 9,000 cusec of water to Tamil Nadu. On Karnataka bandh call given by various organisations on October 6 over the same issue, the actor said he too will support following Karnataka&#8200;Film&#8200;Chamber of Commerce&rsquo;s (KFCC) similar announcement. &ldquo;When the State itself is facing acute water shortage, how can we release water to them&rdquo;, the actor asserted. He also denied any interests to join politics saying; nange rajakeeya barolla (I don&rsquo;t know politics).</p></p>

                            <p class="gotoTop"><a href="#top" class="gototop">Go to Top</a></p>


                            <div class="socialNetworkingLinks">
                                 <a href="http://www.deccanherald.com/tell_a_friend.php?id=281782" style="margin-left:-5px;"><img src="http://www.deccanherald.com/images/email.jpg" alt="" border="0" /></a> 
                                <a href="#" onClick="javascript:window.print();"><img src="http://www.deccanherald.com/images/print.jpg" alt="" border="0" onClick="javascript:window.print();" /></a> 
                                <a href="javascript:addToFavorites()"><img src="http://www.deccanherald.com/images/bookmark.jpg" alt="" border="0" /></a>

我还使用了$strip='<p class="gotoTop">'以及$strip='<div class="socialNetworkingLinks">'$strip="Go to Top"但是没有任何作品可以通过get to top和socialize工具栏返回结果。为什么它不起作用。我的代码有什么问题。它正在为一个提要而不是另一个提供。请帮我解决这个问题。

截图: enter image description here

我想从“Go to Top”开始删除内容。

1 个答案:

答案 0 :(得分:0)

我认为问题出在$content_class=".postedBy";上。该课程中唯一的内容是Mysore, September 28, 2012, DHNS:,与$strip不匹配。

编辑:

postedBy DIV看起来像:

<div class="postedBy">Mysore, September 28, 2012, DHNS:</div>

它不包括文章的正文。