Question

我正在编写一个爬虫程序，它将绕过一组特定的网站，并将所有mp3链接抓取到数据库中。我不想下载文件，只需抓取链接，索引它们并能够搜索它们。使用PHP和一些网站如何linke guruji.com

Answer 1

您可能想要研究正则表达式 - 因此，在连接之后，执行以下操作：

function crawl($url) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30); // 30 second timeout
    curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
    $result = curl_exec ($ch);
    curl_close ($ch);

    if ($result) {
        // top domain links
        preg_match_all( '/<a(?:[^>]*)href=\"([^\"]*)\"(?:[^>]*)>(?:[^<]*)<\/a>/is', $result, $output, PREG_SET_ORDER );

        foreach( $output as $item ) {
            // each link found is output
            echo "<pre>";
            var_dump($item[0]);
            // do your magic here
        }
    }
}

这只会查找所有链接，因此您必须根据您的使用情况调整匹配项，或者想出一个过滤器。

用于动态链接的Mp3链接爬虫

1 个答案: