我需要从HTML文档中提取数据来获取JSON

时间:2015-12-25 20:10:07

标签: php html json domdocument

我试图通过图像,姓名和电话号码将公司的语音信息放入JSON文件中。我试图遍历所有<a>以找到img srcdiv.employee-desc文字,但没有成功。我尝试过DOMdocument(),但也失败了。

<section>
        <a href="tel:+471234567890">
        <article class="clearfix">
          <div class="employee-image">       
            <img src"image_1.jpg">
          </div>
          <div class="employee-desc">
            Emma doe <br>
            +471234567890
          </div>
        </article>
        </a>
        <a href="tel:+471234567890">
        <article class="clearfix">
          <div class="employee-image">       
            <img src"image_2.jpg">
          </div>
          <div class="employee-desc">
            Frank doe <br>
            +471234567890
          </div>
        </article>
        </a>
        <a href="tel:+xxxxxxxx">
        <article class="clearfix">
          <div class="employee-image">       
            <img src"image_3.jpg">
          </div>
          <div class="employee-desc">
            John doe <br>
            +471234567890
          </div>
        </article></a>
    </section>

我的梦想是json文件看起来像这样:

[  
   {  
      "image":"image_1.jpg",
      "name":"Emma doe",
      "phone":"+47 1234567890"
   },
   {  
      "image":"image_2.jpg",
      "name":"Frank doe",
      "phone":"+47 1234567890"
   },
   {  
      "image":"image_3.jpg",
      "name":"John doe",
      "phone":"+47 1234567890"
   }
]

有没有人知道如何在php中完成这项工作?

2 个答案:

答案 0 :(得分:1)

您可以在下面找到代码。 请注意,示例中的img标记不正确。应该是&#39; img src =&#34;&#34;&#39;不是&#39; img src&#34;&#34;&#39;

我假设你的html是在$ html变量中。

$json_arr = array();

$html = substr($html, strpos($html, '<section>') + 9);
$html = substr($html, 0, strpos($html, '</section>'));

$arr = explode('<a href="', $html);
foreach ($arr as $k => $line) {
    if ($k == 0) continue;

    $phone = substr($line, 0, strpos($line, '"'));
    $phone = str_replace('tel:', '', $phone);
    $phone = trim($phone);

    $image = substr($line, strpos($line, '<img src="') + 10);
    $image = substr($image, 0, strpos($image, '"'));

    $name = substr($line, strpos($line, '<div class="employee-desc">') + 37);
    $name = substr($name, 0, strpos($name, '</div>'));
    $name = trim($name);
    $name = substr($name, 0, strpos($name, '<br'));

    $json_arr[$k - 1]['image'] = $image;
    $json_arr[$k - 1]['name'] = $name;
    $json_arr[$k - 1]['phone'] = $phone;
}

$json = json_encode($json_arr);
echo $json . "\n";

答案 1 :(得分:1)

PHP Simple HTML DOM Parser的帮助下采用较短的方法:

$html = HtmlDomParser::str_get_html($data);

foreach($html->find('a') as $element) {
    $image=$element->children(0)->children(0)->children(0)->src;
    list($name,$phone)=array_map('trim', explode('<br>',$element->children(0)->children(1)->innertext));
    $row = (object)compact('image','name','phone');
    $result[]=$row;
}

$output=json_encode($result,JSON_PRETTY_PRINT);