因此,我正在使用php loadHTMLFile()在我的网页上查找链接。但是它总是像我注销时一样收集链接,这会丢失很多我只需要查找的登录日志。这些页面已经具有可以调用的数据库会话信息。有没有一种方法可以将所需的会话信息发送到要爬网的页面以查找动态链接?如果需要,我可以进行错误的登录数据,只需要能够找到仅登录的链接即可。
下面是我用来找到链接的代码:
set_error_handler (function($errno, $errstr, $errfile, $errline) {}); //Swallow unadvoidable errors caused by loadHTMLFile
$valid = $this->doc->loadHTMLFile($path); //Check to insure url is valid link.
restore_error_handler(); //Restore normal error handling
if($valid !== false)
{
$xpath = new DOMXpath($this->doc); //Create instance of DOMXpath() class. --php core--
$elements = $xpath->query("//a[not(@rel='nofollow')]/@href"); //Use $xpath to pull links from page. nofollow links are ignored.
$this_page= array(); //Insure $this_page is set, even if 0 links are found causing a false null on is_null
if (!is_null($elements)) foreach ($elements as $element) $this_page[]= $element->nodeValue; //Create a array of located links from DOMXpath object.
有人有什么想法吗?我完全控制了站点和数据库,这是一个仅管理员脚本,因此用户限制可以降至最低。谢谢!