JSON从DIV标签中解析和提取信息

时间:2013-12-22 05:33:25

标签: html json api

这是我的代码:

   <?php 
$ch = curl_init("http://gothere.sg/a/search?q=527201");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$raw = curl_exec($ch);
curl_close($ch);

$data = json_decode($raw);
echo htmlentities($data->where->html);
?>  

继承人的输出:

<div class=place><img class=marker src="/static/img/2/icon/panel/a.png?v=c2354"/><div class=locf><strong>201E Tampines Street 23</strong><br> Singapore 527201</div><p><a id="tooldt" href="">directions to</a> <a id="tooldf" href="">directions from</a> <a id="toolsn" href="">search nearby</a></p><div id="minibar"><p></p><form class=msf><input type=text><input type=submit value=""><input type=hidden value="527201"></form></div></div><div id=bah><div class=bar><h4>Some businesses around here:</h4></div><p><span style="background-color:##95cf29"></span><a class=bizlink href="" uid="5314c3d8-9775-4a4c-bbed-c28a04126993">United Employment Services</a>, #02-102</p><p><span style="background-color:##95cf29"></span><a class=bizlink href="" uid="05aa7169-4fad-4577-95b5-e79ef411c6f1">Cleverland Educational Services</a>, #04-106</p><p><span style="background-color:##95cf29"></span><a class=bizlink href="" uid="b323d00e-5e4a-45a0-a35f-f196e33c51f3">Tampines Women's Clinic</a>, #01-112</p><p><span style="background-color:##95cf29"></span><a class=bizlink href="" uid="cf5b1145-334d-472e-a965-a2f8ab31da4b">Ming Shing Pawnshop Pte Ltd</a>, #01-96</p><p><span style="background-color:##95cf29"></span><a class=bizlink href="" uid="7cbe2217-b763-4e1d-81de-7f0f8d1be0bb">Froggies</a>, #04-96</p><p><span style="background-color:##95cf29"></span><a class=bizlink href="" uid="43798461-d418-4ac1-a2b5-9f7359f538f5">Tampines St 23 (POSB)</a>, #01-100</p><p><span style="background-color:##95cf29"></span><a class=bizlink href="" uid="2703952e-bfc0-46cd-a981-479ae751b1e4">Arrow Communication</a>, #01-76</p><p><span style="background-color:##95cf29"></span><a class=bizlink href="" uid="8bad697d-42e0-48bb-8cd3-57601e42a39f">Efficient Tuition Centre</a>, #03-102</p><p style="display:none"><span style="background-color:##95cf29"></span><a class=bizlink href="" uid="3e6f08ed-3917-47fa-a434-19e2d90a7682">Guardian - Tampines St 23 Blk 201E</a>, #01-94</p><p style="display:none"><span style="background-color:##95cf29"></span><a class=bizlink href="" uid="3c49c4e8-dc25-49ab-8481-0a8369ba20d7">Yes Boss Food Centre</a></p><p style="display:none"><span style="background-color:##95cf29"></span><a class=bizlink href="" uid="a8d53b86-4b39-4d09-b8d7-67223228e3dd">Universal Medical Clinic</a>, #01-104</p><p style="display:none"><span style="background-color:##95cf29"></span><a class=bizlink href="" uid="4baed1a5-8b5f-4f53-b1b6-056a60ce2a4c">Tampines Pawnshop Pte Ltd</a>, #01-86</p><p style="display:none"><span style="background-color:##95cf29"></span><a class=bizlink href="" uid="9258c2ba-9e4f-48b0-aeab-d98c312e5328">Afghanistan Family Restaurant</a>, #01-56</p><p style="display:none"><span style="background-color:##95cf29"></span><a class=bizlink href="" uid="cb8cdfd6-1203-49e3-a64a-b1d7a64384cc">7 Eleven</a>, #01-100</p><p style="display:none"><span style="background-color:##95cf29"></span><a class=bizlink href="" uid="6c3c8595-b3dd-4fe0-9592-c053392a5036">Hairsolutions (Unisex)</a>, #01-118</p><p style="display:none"><span style="background-color:##95cf29"></span><a class=bizlink href="" uid="307b258e-ce57-46f0-924b-9ea1a01b49f0">Phase Hairdressing - North Bridge</a>, #01-64</p><p id=baha><a href="">+ show all</a></p></div><div id=aah><div class=bar><h4>Browse amenities around here: </h4></div><img class=marker src="/static/img/2/icon/panel/amenities.png?v=ce268"/><p><a value=4 href=>ATMs</a><a value=5 href=>Banks</a><a value=1 href=>Clinics</a><a value=6 href=>Petrol Kiosks</a><br><a value=2 href=>Post Offices</a><a value=3 href=>Schools</a><a value=0 href=>Supermarkets</a></p></div>

因此如何从<div class=locf><strong>201E Tampines Street 23</strong><br> Singapore 527201</div>中提取数据?这是我想要的唯一信息。无论如何我可以在提取后删除<strong> <br>标签吗?

1 个答案:

答案 0 :(得分:0)

如果您的HTML非常稳定,您可能只能使用正则表达式,例如类似的东西(未测试且不太健壮):

$match = array();
if( preg_match('<div class=locf>(.*?)</div>',$data->where->html,$match) ) {
    $locf = $match[1];
} else {
    $locf = '';
}

请注意,如果你有一个嵌套的div,这个特殊的正则表达式会失败,它对空格和大小写也很敏感。在空白和大小写方面使其更加健壮是相对简单的,但嵌套div问题更棘手,并且可能需要的不仅仅是简单的正则表达式。

一旦你有$ locf,你可以用preg_replace替换它里面的任何或所有html标签。