如何匹配正则表达式中的文本

时间:2018-05-25 14:40:27

标签: php regex

我有一个文字

<div class="ti"><div class="pic">
        <a href="/categories/rr/1.html"><img src="http://www.erty.com/images/440f2d2a.jpg" alt="Ind"> <span>Ind</span></a> (98)
    </div></div><div class="ti"><div class="pic">
        <a href="/categories/ert/1.html"><img src="http://www.erty.com/images/4123d2b.jpg" alt="Wes"> <span>Wes</span></a> (6044)
    </div></div>

如何在PHP中使用preg_match_all来获取

  1. /categories/rr/1.html

  2. http://www.erty.com/images/440f2d2a.jpg

  3. 工业

  4. 98

  5. 所有参赛作品。

    我试过

    preg_match_all('|[^<div class="ti"><div class="pic">].*?[^<\/div><\/div>]+|',
    $test_html,
    $out, PREG_PATTERN_ORDER);
    

    但它不起作用。

3 个答案:

答案 0 :(得分:0)

永远不要尝试使用RegExp解析HTML。

由于你的html文件可能也是一个XML文件,试试这个。

$html = "<div class="ti"><div class="pic"><a href="/categories/rr/1.html"><img src="http://www.erty.com/images/440f2d2a.jpg" alt="Ind"> <span>Ind</span></a></div></div><div class="ti"><div class="pic"><a href="/categories/ert/1.html"><img src="http://www.erty.com/images/4123d2b.jpg" alt="Wes"> <span>Wes</span></a></div></div>";
$doc = new DOMDocument();
$doc->loadHTML($html);
$sxml = simplexml_import_dom($doc);

或者,如果您要抓取网站,最好在node.js应用中使用jQuery选择器。

答案 1 :(得分:0)

这不是正则表达式的工作。 PHP具有用于解析HTML文件的内置类,允许您通过DOM查询节点。

pc = pointCloud(rand(100,3,'single'));
pcshow(pc);

输出:

<header class="nav-header">
  <nav class="navbar navbar-expand-md navbar-light bg-light">
    <div class="container">
      <a href="#"><span class="navbar-brand"><i>Logo</i></span></a>
      <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbar-collapse"
              aria-controls="navbar-collapse" aria-expanded="false" aria-label="Toggle navigation">
        <span class="navbar-toggler-icon"></span>
      </button>

      <div class="navbar-collapse collapse">
        <ul class="nav navbar-nav ml-auto">
          <li class="nav-item"><a class="nav-link" href="#">How it works</a></li>
        </ul><!--//nav-->

          <ul class="nav navbar-nav ml-auto">
            <li class="nav-item">
              <a class="nav-link" href="#">My Profile</a>
            </li>
            <li class="nav-item">
              <a class="btn btn-outline-success" href="#">Log out</a>
            </li>
          </ul>

      </div>
    </div>
  </nav><!--//header-->
</header>

<section>
  <div class="showcase">
    <div class="container">
      <div class="jumbotron jumbotron-fluid showcase-content text-white">
        <div class="row">
          <div class="col-lg-6">
            <h2 class="display-4">Some title.</h2>
            <p class="lead">Some additional showcase text.</p>
            <p>
              <a class="btn btn-secondary text-white " href="#">Get Started -></a>
            </p>
          </div>
          <div class="col-lg-6">
            <image></image>
          </div>
        </div>
      </div>
    </div><!--//container-->
  </div>
</section><!--//promo--> 

答案 2 :(得分:0)

$regex = '/href="(.*?)".*src="(.*?)".*alt="(.*?)".*\((\d+)\)/ms';

$string = '
<div class="ti"><div class="pic">
        <a href="/categories/rr/1.html"><img src="http://www.erty.com/images/440f2d2a.jpg" alt="Ind"> <span>Ind</span></a> (98)
    </div></div><div class="ti"><div class="pic">
        <a href="/categories/ert/1.html"><img src="http://www.erty.com/images/4123d2b.jpg" alt="Wes"> <span>Wes</span></a> (6044)
    </div></div>
';

preg_match_all($regex, $string, $matches);

print_r($matches);

<强>输出:

Array
(
    [0] => Array
        (
            [0] => href="/categories/rr/1.html"><img src="http://www.erty.com/images/440f2d2a.jpg" alt="Ind"> <span>Ind</span></a> (98)
    </div></div><div class="ti"><div class="pic">
        <a href="/categories/ert/1.html"><img src="http://www.erty.com/images/4123d2b.jpg" alt="Wes"> <span>Wes</span></a> (6044)
        )

    [1] => Array
        (
            [0] => /categories/rr/1.html
        )

    [2] => Array
        (
            [0] => http://www.erty.com/images/4123d2b.jpg
        )

    [3] => Array
        (
            [0] => Wes
        )

    [4] => Array
        (
            [0] => 6044
        )

)