我有一个简单的CURL脚本,可以在Google上搜索“蝙蝠侠”,然后将结果保存到文件中......
有人能告诉我一个很好的方法来遍历文件以找到每个搜索结果的标题和网址吗?
这是我的代码:
function get_remote_file_to_cache()
{
$the_site = "https://www.google.se/webhp?sourceid=chrome-instant&rlz=1C5CHFA_enSE555SE556&ion=1&espv=2&ie=UTF-8#newwindow=1&q=batman";
$curl = curl_init ();
$fp = fopen ( "temp_file.txt", "w" );
curl_setopt ( $curl, CURLOPT_URL, $the_site );
curl_setopt ( $curl, CURLOPT_FILE, $fp );
curl_setopt ( $curl, CURLOPT_RETURNTRANSFER, TRUE );
curl_exec ( $curl );
$httpCode = curl_getinfo ( $curl, CURLINFO_HTTP_CODE );
if ($httpCode == 404)
{
touch ( 'cache/404_err.txt' );
} /*
* else { touch('cache/'.rand(0, 99999).'--all_good.txt'); }
*/
else
{
$contents = curl_exec ( $curl );
fwrite ( $fp, $contents );
}
curl_close ( $curl );
fclose ( $fp );
}
echo rand(1, 425).get_remote_file_to_cache();
答案 0 :(得分:1)
您可以使用DOMDocument和DOMXPath
搜索HTML// Temp:
$sPageHTML = '<html><head></head><body><div class="test">Text here</div></body></html>';
$oDomDocument = new DOMDocument ( );
$oDomDocument->loadHTML ( $sPageHTML );
// Now, search the DOM structure for all divs with class "test".
$oXPath = new DOMXPath ( $oDomDocument );
$results = $oXPath->query ( '//div[@class="test"]' );
// Loop through the results.
foreach ( $results as $result )
{
echo 'Innertext: ' . $result->nodeValue;
}
祝你好运
答案 1 :(得分:0)
如果您仍在搜索,可以在此处找到一个开源的php google scraper: http://scraping.compunect.com/?scrape-google-search(滚动到底部以获取代码)
您可以从中复制DOM解析例程,它们可以很好地工作。