strip_tags崩溃了很大的价值

时间:2014-01-05 13:04:29

标签: php preg-match preg-match-all strip-tags

我正在使用strip_tags从xml文件中剥离标签,当数组大小较小时它工作得很好但是如果页面很大它总是崩溃。我的脚本最多可以运行100个值,但崩溃更大的值

        preg_match_all("/<image:caption>.*?<\/image:caption>|<image:loc>.*?<\/image:loc>|<loc>.*?<\/loc>/", $str, $results);
         $arr = array_chunk(array_map('strip_tags', $results[0]), 1000);

        for($i=0;$i < 1000;$i++){
      for($j=0;$j < 1000;$j++){


      $output=$arr[$i][$j]. '</br>';


      echo $output;
        }

        }   

它会很好地剥离这些值,但是对于较大的文件,它会崩溃。

      <urlset>

        <url><loc>/1366x768/citroen-ds-cabrio-auto-car-wallshark-com-228615.html</loc><image:image><image:loc>s/1366x768/citroen-ds/228615/citroen-ds-cabrio-auto-car-wallshark-com-228615.jpg</image:loc><image:caption>Citroen Ds Cabrio Auto Car Wallshark Com  Walpapers</image:caption></image:image></url>

          <url><loc>/1366x768/citroen-ds-cars-citro-n-cabrio-213157.html</loc><image:image><image:loc>s/1366x768/citroen-ds/213157/citroen-ds-cars-citro-n-cabrio-213157.jpg</image:loc><image:caption>Citroen Ds Cars Citro N Cabrio  Walpapers</image:caption></image:image></url>

          <url><loc>/1366x768/citroen-ds-citro-n-pictures-95569.html</loc><image:image><image:loc>s/1366x768/citroen-ds/95569/citroen-ds-citro-n-pictures-95569.jpg</image:loc><image:caption>Citroen Ds Citro N Pictures  Walpapers</image:caption></image:image></url>
        </urlset>

1 个答案:

答案 0 :(得分:1)

你可以试试这个:

<pre><?php

$dom = new DOMDocument();
@$dom->load('Remotefile.xml');

$urls = $dom->getElementsByTagName('url');

foreach ($urls as $url) {
    $image = $url->getElementsByTagName('image')->item(0);
    $imageChildren = $image->childNodes;

    $result[] = array( 'loc' => $url->getElementsByTagName('loc')->item(0)->textContent,
                       'imgloc' => $imageChildren->item(0)->textContent,
                       'imgcap' => $imageChildren->item(1)->textContent);
}

$stmt = $dbh->prepare ("INSERT INTO urls (loc, imageloc, imagecap) VALUES (:loc, :imgloc, :imgcap)");

foreach ($result as $res) {
    $stmt -> bindParam(':loc',    $res['loc']);
    $stmt -> bindParam(':imgloc', $res['imgloc']);
    $stmt -> bindParam(':imgcap', $res['imgcap']);
    $stmt -> execute();
}

正则表达方式:

$pattern = <<<'LOD'
~
  <url>                                                \s*+
  <loc>           (?<loc>    [^<]++ ) </loc>           \s*+
  <image:image>                                        \s*+
  <image:loc>     (?<imgloc> [^<]++ ) </image:loc>     \s*+
  <image:caption> (?<imgcap> [^<]++ ) </image:caption> \s*+
  </image:image>                                       \s*+
  </url>
~x
LOD;

preg_match_all($pattern, $str, $matches, PREG_SET_ORDER);

/* this foreach part is only for cosmetic and is totally useless */
foreach($matches as &$match) {
    foreach($match as $k=>$m) {
        if (is_numeric($k)) unset($match[$k]);
    }
}
print_r($matches);