Question

我一直在使用SwiftSoup从Swift中的许多网站上刮取主体文本，但是某些网站，例如CNN或Hill（例如：https://www.cnn.com/2019/07/25/us/colorado-missing-girl-remains-found-after-34-years/index.html）或（https://thehill.com/homenews/media/454838-cnn-announces-climate-town-hall-with-2020-democrats）会刮错文字。

到目前为止，我已经尝试过SwiftSoup抓取网站。

这是我到目前为止为SwiftSoup使用的代码：

func htmlToText(html: String) -> String
{
    var text = ""
    do
    {
        let els: Elements = try SwiftSoup.parse(html).select("p")
        let links: Elements = try (els.select("time")).remove()
        let pTag: Elements = try els.prepend("/n")

        text = try pTag.text()
    }
    catch Exception.Error(let type, let message)
    {
        print(message)
    }
    catch
    {
        print("error")
    }

    if text.contains("/n")
    {
        text = text.replacingOccurrences(of: "/n", with: "\n")
    }
    text = text.replacingOccurrences(of: "Advertisement", with: "")

    return text
}

但是，最终结果只刮擦了文章的一小部分。

如何使用SwiftSoup在网络上抓取某些文章？

0 个答案: