我正在尝试使用SwiftSoup使Swift抓取网站。但是,诸如https://apple.news/AQZXxg8mUQfKrEaM9MRBpxw之类的网站会使用JavaScript自动重定向,这会导致SwiftSoup抓取打开的页面而不是我想要的实际文章。我应该如何抓取此链接,以便它可以抓取相关的实际文章,而不是重定向的封面网页?
我尝试使用状态码,但是该特定网站未提供301或302的状态码,而给出的状态码为200。我尝试抓取链接HTML的JavaScript部分,但没有完全知道该怎么办。
答案 0 :(得分:1)
func redirectUrl() {
let url = URL(string: "https://apple.news/AQZXxg8mUQfKrEaM9MRBpxw")!
URLSession.shared.dataTask(with: url) { (data, response, error) in
let html = String(data: data!, encoding: .utf8) ?? "none"
self.parse(html: html)
}.resume()
}
func parse(html: String) {
do {
let doc = try SwiftSoup.parse(html)
let link: Element = try doc.select("a").first()!
let linkHref = try link.attr("href")
print(linkHref)
} catch let error {
print(error.localizedDescription)
}
}
这将是印刷版
https://www.npr.org/2019/06/18/733401736/npr-identifies-fourth-attacker-in-civil-rights-era-cold-case
这将适用于重定向网址
func redirectLink(url: URL, completion: @escaping (URL?) -> Void) {
var request = URLRequest(url: url, cachePolicy: .reloadIgnoringLocalCacheData, timeoutInterval: 15.0)
request.httpMethod = "HEAD"
URLSession.shared.dataTask(with: request) { (data, response, error) in
if let response = response {
completion(response.url)
}
}.resume()
}