Question

我正在使用一个库来解析特定数据的HTML。它还提供了方便的fetch功能。然而，它有一个奇怪的线，我不明白。这是代码：

function fetch($url, &$curlInfo=null) {
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
    $response = curl_exec($ch);
    $info = $curlInfo = curl_getinfo($ch);
    curl_close($ch);

    if (strpos(strtolower($info['content_type']), 'html') === false) {
        // The content was not delivered as HTML, do not attempt to parse it.
        return null;
    }

    $html = mb_substr($response, $info['header_size']);
    return parse($html, $url);
}

倒数第二行目前最终砍掉了实际HTML的第一个 n 叮咬。自首次撰写以来，cURL是否改变了行为？

使用cURL获取网站HTML的正确方法是什么？

Answer 1

CURLOPT_HEADER确定HTTP标头是否包含在curl_exec的输出中。

由于您已将其设置为0，因此它们不会 - 但您根据内容标题的大小剪切了多个字符。

使用cURL和PHP来获取网站的HTML。我如何获得所有这些？

1 个答案: