我有一个文字
<div class="ti"><div class="pic">
<a href="/categories/rr/1.html"><img src="http://www.erty.com/images/440f2d2a.jpg" alt="Ind"> <span>Ind</span></a> (98)
</div></div><div class="ti"><div class="pic">
<a href="/categories/ert/1.html"><img src="http://www.erty.com/images/4123d2b.jpg" alt="Wes"> <span>Wes</span></a> (6044)
</div></div>
如何在PHP中使用preg_match_all来获取
/categories/rr/1.html
工业
98
所有参赛作品。
我试过
preg_match_all('|[^<div class="ti"><div class="pic">].*?[^<\/div><\/div>]+|',
$test_html,
$out, PREG_PATTERN_ORDER);
但它不起作用。
答案 0 :(得分:0)
永远不要尝试使用RegExp解析HTML。
由于你的html文件可能也是一个XML文件,试试这个。
$html = "<div class="ti"><div class="pic"><a href="/categories/rr/1.html"><img src="http://www.erty.com/images/440f2d2a.jpg" alt="Ind"> <span>Ind</span></a></div></div><div class="ti"><div class="pic"><a href="/categories/ert/1.html"><img src="http://www.erty.com/images/4123d2b.jpg" alt="Wes"> <span>Wes</span></a></div></div>";
$doc = new DOMDocument();
$doc->loadHTML($html);
$sxml = simplexml_import_dom($doc);
或者,如果您要抓取网站,最好在node.js应用中使用jQuery选择器。
答案 1 :(得分:0)
这不是正则表达式的工作。 PHP具有用于解析HTML文件的内置类,允许您通过DOM查询节点。
pc = pointCloud(rand(100,3,'single'));
pcshow(pc);
输出:
<header class="nav-header">
<nav class="navbar navbar-expand-md navbar-light bg-light">
<div class="container">
<a href="#"><span class="navbar-brand"><i>Logo</i></span></a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbar-collapse"
aria-controls="navbar-collapse" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="navbar-collapse collapse">
<ul class="nav navbar-nav ml-auto">
<li class="nav-item"><a class="nav-link" href="#">How it works</a></li>
</ul><!--//nav-->
<ul class="nav navbar-nav ml-auto">
<li class="nav-item">
<a class="nav-link" href="#">My Profile</a>
</li>
<li class="nav-item">
<a class="btn btn-outline-success" href="#">Log out</a>
</li>
</ul>
</div>
</div>
</nav><!--//header-->
</header>
<section>
<div class="showcase">
<div class="container">
<div class="jumbotron jumbotron-fluid showcase-content text-white">
<div class="row">
<div class="col-lg-6">
<h2 class="display-4">Some title.</h2>
<p class="lead">Some additional showcase text.</p>
<p>
<a class="btn btn-secondary text-white " href="#">Get Started -></a>
</p>
</div>
<div class="col-lg-6">
<image></image>
</div>
</div>
</div>
</div><!--//container-->
</div>
</section><!--//promo-->
答案 2 :(得分:0)
$regex = '/href="(.*?)".*src="(.*?)".*alt="(.*?)".*\((\d+)\)/ms';
$string = '
<div class="ti"><div class="pic">
<a href="/categories/rr/1.html"><img src="http://www.erty.com/images/440f2d2a.jpg" alt="Ind"> <span>Ind</span></a> (98)
</div></div><div class="ti"><div class="pic">
<a href="/categories/ert/1.html"><img src="http://www.erty.com/images/4123d2b.jpg" alt="Wes"> <span>Wes</span></a> (6044)
</div></div>
';
preg_match_all($regex, $string, $matches);
print_r($matches);
<强>输出:强>
Array
(
[0] => Array
(
[0] => href="/categories/rr/1.html"><img src="http://www.erty.com/images/440f2d2a.jpg" alt="Ind"> <span>Ind</span></a> (98)
</div></div><div class="ti"><div class="pic">
<a href="/categories/ert/1.html"><img src="http://www.erty.com/images/4123d2b.jpg" alt="Wes"> <span>Wes</span></a> (6044)
)
[1] => Array
(
[0] => /categories/rr/1.html
)
[2] => Array
(
[0] => http://www.erty.com/images/4123d2b.jpg
)
[3] => Array
(
[0] => Wes
)
[4] => Array
(
[0] => 6044
)
)