我第一次尝试在PHP中使用Curl,原因是我想从此页面中搜索结果:http://www.lldj.com/pastresult.php。该网站自2002年以来每周发布一次乐透结果,并有一个简单的提交表单(日期)。
提交按钮:名称=按钮/值=提交 选择下拉列表:Name = Draw&选项#(1 - 1097)//代表抽奖号
我可以手动检查它,但我想为什么我不使用简单的脚本并使其更容易,因为我也有兴趣测试如何使用PHP / CURL提交数据并检索结果。
我使用DOM PHP进行抓取,我很乐意使用语法。 我想知道我是否应该一起使用Curl和DOM,或者这可以通过CURL来实现。
到目前为止我所拥有的;
include'dom.php';
$post_data['draw'] = '1097';
$post_data['button'] = 'Submit';
//traverse array and prepare data for posting (key1=value1)
foreach ( $post_data as $key => $value) {
$post_items[] = $key . '=' . $value;
}
//create the final string to be posted using implode()
$post_string = implode ('&', $post_items);
//create cURL connection
$curl_connection =
curl_init('http://www.lldj.com/pastresult.php');
//set options
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT,
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, 1);
//set data to be posted
curl_setopt($curl_connection, CURLOPT_POSTFIELDS, $post_string);
//perform our request
$result = curl_exec($curl_connection);
//show information regarding the request
print_r(curl_getinfo($curl_connection));
echo curl_errno($curl_connection) . '-' .
curl_error($curl_connection);
提交数据/抓取后
$t = $curl_connection->find('table',0); // ?? usualy referes to file_get_content Var
$data = $t->find('tr');
foreach($data as $n) {
$tds = $n->find('td');
$dataRows = array();
$dataRows['num'] = $tds[0]->find('img',0)->href;
var_dump($dataRows);
}
有人可以指出这是否正确?如何设置为自动增加提交值然后重复该过程(例如,提交darw = 1然后draw = 2等。) 感谢
答案 0 :(得分:1)
<?php
while(true){
for($i=1;$i<5000;$i++){
$post_data['draw'] = $i; // will change every time like 1,2,3,4
$post_data['button'] = 'Submit';
//traverse array and prepare data for posting (key1=value1)
foreach ( $post_data as $key => $value) {
$post_items[] = $key . '=' . $value;
}
//create the final string to be posted using implode()
$post_string = implode ('&', $post_items);
//create cURL connection
$curl_connection =
curl_init('http://www.lldj.com/pastresult.php');
//set options
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_USERAGENT,
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl_connection, CURLOPT_FOLLOWLOCATION, 1);
//set data to be posted
curl_setopt($curl_connection, CURLOPT_POSTFIELDS, $post_string);
//perform our request
$result = curl_exec($curl_connection);
//show information regarding the request
print_r(curl_getinfo($curl_connection));
echo curl_errno($curl_connection) . '-' .
curl_error($curl_connection);
//开始废料
$t = $curl_connection->find('table',0); // ?? usualy referes to file_get_content Var
$data = $t->find('tr');
foreach($data as $n) {
$tds = $n->find('td');
$dataRows = array();
$dataRows['num'] = $tds[0]->find('img',0)->href;
var_dump($dataRows);
}
} for loop end here
}?>
这里只是通过改变id连续使用curl的骨架,你可以按自己的方式设置它。
请确保在获取数据后清除变量。
使用
...
curl_close($ch);
unset($fields_string);
...
答案 1 :(得分:0)
加载页面
获取远程内容的首选方式是file_get_contents()
。使用:
$html = file_get_contents('http://www.lldj.com/pastresult.php');
多数民众赞成的。
从页面获取内容
要从页面获取内容,您通常会使用DOMDocument
和DOMXPath
:
$doc = new DOMDocument();
@$doc->loadHTML($html);
$selector = new DOMXpath($doc);
// xpath query
$result = $selector->query('YOUR QUERY');