我有一个结合了CURL和DOM的代码。我的代码:
<?php
// Create temp file to store cookies
$ckfile = tempnam ("/tmp", "CURLCOOKIE");
// URL to login page
$url = "https://www.investagrams.com/login";
// Get Login page and its cookies and save cookies in the temp file
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
#$output = curl_exec($ch);
$fields = array(
'ctl00$WelcomePageMainContent$ctl00$Username' => '********',
'ctl00$WelcomePageMainContent$ctl00$Password' => '********',
);
$fields_string = '';
foreach($fields as $key=>$value) {
$fields_string .= $key . '=' . $value . '&';
}
rtrim($fields_string, '&');
// Post login form and follow redirects
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, count($fields));
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields_string);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
#$output = curl_exec($ch);
$url = "https://www.investagrams.com/Stock/RealTimeMonitoring";
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
#echo $output;
$dom = new DomDocument;
$dom->loadHtmlFile($output);
$xpath = new DomXPath($dom);
// collect header names
$headerNames = array();
foreach ($xpath->query('//table[@id="StockQuoteTable"]//th') as $node) {
$headerNames[] = $node->nodeValue;
}
// collect data
$data = array();
foreach ($xpath->query('//tbody[@id="StockQuoteTable:tbody_element"]/tr') as $node) {
$rowData = array();
foreach ($xpath->query('td', $node) as $cell) {
$rowData[] = $cell->nodeValue;
}
$data[] = array_combine($headerNames, $rowData);
}
print_r($data);
?>
这只加载到&#34; Arrays():&#34; 这是我要提取的表的信息: 我不知道哪个部分是错的。 Curl部分100%工作,错误在DOM部分。谢谢
<div class="dataTables_scrollBody" style="overflow: auto; height: 300px; width: 100%;">
<table id="StockQuoteTable" class="table dataTable no-footer" role="grid" aria-describedby="StockQuoteTable_info" style="width: 1166px;">
<thead></thead>
<tbody>
<tr id="num1" class="odd" role="row"
答案 0 :(得分:0)
我能够找到您的代码的部分问题,但似乎curl请求提供的HTML代码似乎有一些错误阻止函数DOMXPath::query
返回有效匹配。
我在代码中修复的问题是由于您使用DOMDocument::loadHTMLfile
而不是DOMDocument::loadHTML
来包含从curl请求中检索到的HTML。所以有效的脚本应该是:
<?php
// Create temp file to store cookies
$ckfile = tempnam ("/tmp", "CURLCOOKIE");
// URL to login page
$url = "https://www.investagrams.com/login";
// Get Login page and its cookies and save cookies in the temp file
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
#$output = curl_exec($ch);
$fields = array(
'ctl00$WelcomePageMainContent$ctl00$Username' => '********',
'ctl00$WelcomePageMainContent$ctl00$Password' => '********',
);
$fields_string = '';
foreach($fields as $key=>$value) {
$fields_string .= $key . '=' . $value . '&';
}
rtrim($fields_string, '&');
// Post login form and follow redirects
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, count($fields));
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields_string);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
#$output = curl_exec($ch);
$url = "https://www.investagrams.com/Stock/RealTimeMonitoring";
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // Accepts all CAs
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
#echo $output;
#print_r($output);
$dom = new DomDocument;
@$dom->loadHtml($output);
$xpath = new DomXPath($dom);
// collect header names
$headerNames = array();
foreach ($xpath->query('//table[@id="StockQuoteTable"]//th') as $node) {
$headerNames[] = $node->nodeValue;
}
// collect data
$data = array();
foreach ($xpath->query('//tbody[@id="StockQuoteTable:tbody_element"]/tr') as $node) {
$rowData = array();
foreach ($xpath->query('td', $node) as $cell) {
$rowData[] = $cell->nodeValue;
}
$data[] = array_combine($headerNames, $rowData);
}
print_r($data);
?>
此外,我在loadHTML函数之前添加了@
符号以抑制错误。