在PHP的JS块中爬行数据

时间:2018-12-16 17:53:58

标签: php

我想从一个网页中查找数据,该网页将数据存储在Java Script块中,然后使用这些数据呈现该页面。如何在PHP中获取此类数据?
我已经尝试过DOMXPath和DomDocument,仍然没有运气!
下面,我将发布目标页面的示例。

<html>
...
<script type="text/javascript">
var showHeader = true;
var Data = {
  "packageId": "120",
  "packageTitle": "West Bengal",
  "Type": "Customizable",
   "Components": [
    {
      "destination": "Darjeeling",
      "dayNum": {
        "1": {
          "sightseeings": [
            "No Sightseeing"
          ],
          "itineraries": {
            "title": "Bagdogra / New Jalpaiguri - Darjeeling",
            "description": "<p>Welcome to darjeeling.</p>"
          }
        }
        }
      }
    ]
    };
</script>
<body>
...
</body>
</html>

我想使用PHP检索关联数组中的所有数据,因此可以使用 $ data ['showHeader'] $ data ['data'] ['packageId' ]

1 个答案:

答案 0 :(得分:0)

如何使用正则表达式提取必要的数据并将其转换为数组,如下所示:

if (preg_match('#^var showHeader = (?P<showHeader>\w+);\s*^var Data = (?P<json>{.*?});#ms', $html, $m)) {
    $data = [
        'showHeader' => ($m['showHeader'] === 'true' || $m['showHeader'] === '1'),
        'data' => json_decode($m['json'], true)
    ];

    echo $data['data']['packageId'];
} else {
    echo 'js data not found';
}

如果有多个脚本,则可能要分析所有脚本的内容:

libxml_use_internal_errors(true);
$dom = new DomDocument();
$dom->loadHTML($html);

$data = [];
foreach ($dom->getElementsByTagName('script') as $script) {
    if (preg_match('#^var showHeader = (?P<showHeader>\w+);\s*^var Data = (?P<json>{.*?});#ms', $script->nodeValue, $m)) {
        $data['showHeader'] = ($m['showHeader'] === 'true' || $m['showHeader'] === '1');
        $data['data'] = json_decode($m['json'], true);
        break;
    }
}

if ($data) {
    echo $data['showHeader']['packageId'];
} else {
    echo 'js data not found';
}