Question

更新Yahoo错误

好的，所以我把它全部搞定了，但是preg_match_all对雅虎不起作用。如果你看看： http://se.search.yahoo.com/search?p=random&toggle=1&cop=mss&ei=UTF-8&fr=yfp-t 然后你可以在他们的HTML中看到，他们有 <span class="url" id="something random"> the actual link </span> 但是当我尝试preg_match_all时，我不会得到任何结果。

preg_match_all('#<span class="url" id="(.*)">(.+?)</span>#si', $urlContents[2], $yahoo);

有人有个主意吗？

更新结束

我正在尝试使用cURL curl_multi_getcontent方法preg_match_all我从Google获得的结果。

我已成功获取网站，但是当我试图获取链接的结果时，它只需要太多。

我目前正在使用： preg_match_all('#<cite>(.+)</cite>#si', $urlContents[0], $links);

这应该从它应该开始，但它不会停止，它只是继续前进。例如，检查www.google.com/search?q=random处的HTML，您会看到所有链接都以＃开头并以...结尾。

有人可以帮我解决这些信息吗？我只需要每个结果的实际链接地址。

更新整个PHP脚本

public function multiSearch($question)
{
    $sites['google'] = "http://www.google.com/search?q={$question}&gl=sv";
    $sites['bing'] = "http://www.bing.com/search?q={$question}";
    $sites['yahoo'] = "http://se.search.yahoo.com/search?p={$question}";

    $urlHandler = array();

    foreach($sites as $site)
    {
        $handler = curl_init();
        curl_setopt($handler, CURLOPT_URL, $site);
        curl_setopt($handler, CURLOPT_HEADER, 0);
        curl_setopt($handler, CURLOPT_RETURNTRANSFER, 1);

        array_push($urlHandler, $handler);
    }

    $multiHandler = curl_multi_init();
    foreach($urlHandler as $key => $url)
    {
        curl_multi_add_handle($multiHandler, $url);
    }

    $running = null;
    do
    {
        curl_multi_exec($multiHandler, $running);
    }
    while($running > 0);

    $urlContents = array();
    foreach($urlHandler as $key => $url)
    {
        $urlContents[$key] = curl_multi_getcontent($url);
    }

    foreach($urlHandler as $key => $url)
    {
        curl_multi_remove_handle($multiHandler, $url);
    }

    foreach($urlContents as $urlContent)
    {
        preg_match_all('/<li class="g">(.*?)<\/li>/si', $urlContent, $matches);
        //$this->view_data['results'][] = "Random";
    }
    preg_match_all('#<div id="search"(.*)</ol></div>#i', $urlContents[0], $match);
    preg_match_all('#<cite>(.+)</cite>#si', $urlContents[0], $links);
    var_dump($links);

}

Answer 1

在 U -ngready模式

中运行正则表达式

preg_match_all('#<cite>(.+)</cite>#siU

Answer 2

在Darhazer的回答中，您可以使用U模式修饰符打开整个正则表达式的 ungreedy 模式，或者只是使模式本身不合适（或 lazy ）使用?：

跟随它

preg_match_all('#<cite>(.+?)</cite>#si', ...

Preg_match_all没有停在它应该的位置

更新Yahoo错误

更新结束

更新整个PHP脚本

2 个答案: