Question

我一直在玩cURL和xpath进行一些网页编写。我终于让我的代码按照我想要的方式运行，但在尝试另一方后它停止了。我唯一改变的是路径和网址。我是全新的，只用了一个星期。因此，如果它明显失败，请耐心等待。

我的代码是：

＆＃13;

<?php
/*----Connection to Database----*/
include('wp-config.php');
mysql_connect(DB_HOST, DB_USER, DB_PASSWORD);
mysql_select_db("db");

/*----US Dollar Index----*/
$url = "http://www.wsj.com/mdc/public/page/2_3023-fut_index-futures.html";
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';

// Make the cURL request
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html= curl_exec($ch);
if (!$html) {
	echo "<br />cURL error number:" .curl_errno($ch);
	echo "<br />cURL error:" . curl_error($ch);
	exit;
}

// Parse the html into a DOMDocument
$dom = new DOMDocument();
@$dom->loadHTML($html);

// Grab all the MONTH on the page
$xpath = new DOMXPath($dom);

$data = $xpath->query("/html/body/div[6]/div[3]/div/table[9]/tbody/tr[position() >= 3 and position() <=6]");

//[position() >= 1 and position() <=13]

// Searching for data
$values = array();
foreach($data as $row) {
	$values[] = $row->nodeValue;
}

print_r($values);

?>
</body>
</html>

＆＃13;

Answer 1

有些事情会浮现在脑海中。你有没有检查传入的html是什么样的，它有什么不属于那里的东西？您正在寻找正确的xpath吗？至少在这个较旧的答案中，似乎xpath的范围应该以表格

给出

[position() >= 100 and not(position() > 200)]

https://stackoverflow.com/a/3355022/5526468

编辑：现在我想到了，如果实际html中的项目数量少于所需的数量，那么xpath可能会将范围表达式评估为false，因此查询中找不到任何项目？

Answer 2

我解决了我的问题，这就是路径。萤火虫给我的路径不是该网站的正确路径。为什么我不知道。

Xpath查询无法正常工作

2 个答案: