从PHP中的html视图中提取字符串数据

时间:2014-02-12 12:11:54

标签: php

我想抓住html胡子视图中的JS和CSS文件。

视图片段:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>This is my beautiful page</title>
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta name="author" content="">

    <!-- Le styles -->
    <link href="{{template_assets}}css/bootstrap.css" rel="stylesheet">
    <link href="{{template_assets}}css/style.css" rel="stylesheet">
    <link href="{{template_assets}}css/bootstrap-responsive.css" rel="stylesheet">

    <script src="{{template_assets}}js/jquery-1.7.2.min.js"></script>
  </head>
  <body>
    Hello world <a href="http://thisshouldnotbeincluded.com">test</a> <img src="neitherthis.jpg">
  </body>
</html>

现在,上面的内容在变量$ file_get中,通过执行:

$file_scan = file_get_contents($view);

从这里我想将css路径和js路径存储到每个独立的数组中,因此最终的结果将是:

$css_files = 
array (size=3)
  0 => string '{{template_assets}}css/bootstrap.css' (length=36)
  1 => string '{{template_assets}}css/style.css' (length=32)
  2 => string '{{template_assets}}css/bootstrap-responsive.css' (length=47)

$js_files = 
array (size=1)
  0 => string '{{template_assets}}js/jquery-1.7.2.min.js' (length=41)

我如何浏览文件才能获取css和js文件?我应该遍历每一行,但是如何(如果

我试过str_replace并且爆炸但没有运气

由于

3 个答案:

答案 0 :(得分:2)

以下是使用DOMDocumentDOMXPath的实现,这是执行此操作的正确方法:

<?php
    $file_scan = file_get_contents($view);

    $css_files = array();
    $js_files = array();

    $doc = new DOMDocument();
    $doc->loadHTML($file_scan);

    $xpath = new DOMXPath($doc);

    $links = $xpath->query('/html/head/link[@rel = "stylesheet"]');
    foreach ($links as $link) {
        $css_files[] = $link->getAttribute('href');
    }

    $scripts = $xpath->query('/html/head/script[@src]');
    foreach ($scripts as $script) {
        $js_files[] = $script->getAttribute('src');
    }

    var_dump($css_files);
    var_dump($js_files);
?>

如果您觉得必须使用正则表达式,那么这样做会比DOMDocument方法更脆弱:

<?php
    $file_scan = file_get_contents($view);

    $css_files = array();
    $js_files = array();

    if (preg_match_all('/"({{.*(?:.css|.js))"/', $file_scan, $matches) > 0) {
        foreach ($matches[1] as $match) {
            if (substr($match, -3) === 'css') {
                $css_files[] = $match;
            } else {
                $js_files[] = $match;
            }
        }
    }

    var_dump($css_files);
    var_dump($js_files);
?>

答案 1 :(得分:1)

使用DOMDocument:

$dom = new DOMDocument;
$dom->loadHTML($file_scan);
$dom->preserveWhiteSpace = false;
$scripts = $dom->getElementsByTagName('script');
foreach ($scripts as $script) {
  echo $script->getAttribute('src');
}

$links = $dom->getElementsByTagName('link');
foreach ($links as $link) {
  if ($link->getAttribute('rel') == 'stylesheet') {
    echo $link->getAttribute('href');
  }
}

答案 2 :(得分:0)

另一种正则表达式解决方案,更灵活:

$rx_tpl = '/<%s\s+[^<>]*%s\s*=\s*"(\{\{[^"{}]+\}\}[^"{}]+)"/is';
$a_types = array(
    'css' => array('link', 'href'),
    'js' => array('script', 'src')
);
$a_results_by_type = array();
foreach ($a_types as $key => $a) {
    $a_curr = array();
    if (preg_match_all(sprintf($rx_tpl, $a[0], $a[1]), $file_scan, $a_matches, PREG_PATTERN_ORDER))
        $a_results_by_type[$key] = $a_matches[1];
}

print_r($a_results_by_type);