我想从一个网页中查找数据,该网页将数据存储在Java Script块中,然后使用这些数据呈现该页面。如何在PHP中获取此类数据?
我已经尝试过DOMXPath和DomDocument,仍然没有运气!
下面,我将发布目标页面的示例。
<html>
...
<script type="text/javascript">
var showHeader = true;
var Data = {
"packageId": "120",
"packageTitle": "West Bengal",
"Type": "Customizable",
"Components": [
{
"destination": "Darjeeling",
"dayNum": {
"1": {
"sightseeings": [
"No Sightseeing"
],
"itineraries": {
"title": "Bagdogra / New Jalpaiguri - Darjeeling",
"description": "<p>Welcome to darjeeling.</p>"
}
}
}
}
]
};
</script>
<body>
...
</body>
</html>
我想使用PHP检索关联数组中的所有数据,因此可以使用 $ data ['showHeader'] 或 $ data ['data'] ['packageId' ]
答案 0 :(得分:0)
如何使用正则表达式提取必要的数据并将其转换为数组,如下所示:
if (preg_match('#^var showHeader = (?P<showHeader>\w+);\s*^var Data = (?P<json>{.*?});#ms', $html, $m)) {
$data = [
'showHeader' => ($m['showHeader'] === 'true' || $m['showHeader'] === '1'),
'data' => json_decode($m['json'], true)
];
echo $data['data']['packageId'];
} else {
echo 'js data not found';
}
如果有多个脚本,则可能要分析所有脚本的内容:
libxml_use_internal_errors(true);
$dom = new DomDocument();
$dom->loadHTML($html);
$data = [];
foreach ($dom->getElementsByTagName('script') as $script) {
if (preg_match('#^var showHeader = (?P<showHeader>\w+);\s*^var Data = (?P<json>{.*?});#ms', $script->nodeValue, $m)) {
$data['showHeader'] = ($m['showHeader'] === 'true' || $m['showHeader'] === '1');
$data['data'] = json_decode($m['json'], true);
break;
}
}
if ($data) {
echo $data['showHeader']['packageId'];
} else {
echo 'js data not found';
}