如何通过网络抓取获取列表元素?

时间:2020-10-01 11:50:37

标签: swift swiftsoup

我需要通过网络抓取来获取列表元素。我无法一一到达元素,我可以一string来获得所有元素。如何使用SwiftSoup或任何其他选项获取列表元素?

这是我的功能:

 self.webView.evaluateJavaScript("document.getElementsByTagName('html')[0].innerHTML") { (value, error) in
            if error != nil {
                print("Err: \(error)")
            }else{
                
                //print(value!)
                
                self.innerDetail = value as! String
                
                do {
                    let html = self.innerDetail
                    let doc: Document = try SwiftSoup.parse(html)
                    
                    // BURADA IMAGE URL LERINI ALIRIZ DETAY SAYFALARI ICIN...
                    let imageLink = try doc.getElementsByClass("img-container")
                    let src: Elements = try imageLink.select("img[src]")
                    let imageUrlStringArray: [String?] = src.array().map { try? $0.attr("src").description }
                    
                    print(imageUrlStringArray)  // BUNDA BUTUN DETAY IMAGE URL LERI SAKLANIR..
                    
                    
                    // BURADA ARABANIN MARKASI MODELI YILI KM VE YAKIT OLARAK CEKILMESI GEREKMEKTEDIR..
                    // ONCE FIYATI TABIKI..
                    
                    let priceMainClass = try doc.getElementsByClass("price")
                    print(try priceMainClass.text())  // BU FIYATTIR..
                    
                    
                    // BURDA COK FAZLA DATA GELIYOR VE LISTE SEKLINDELER..
                    let detailClass = try doc.getElementsByClass("classified-info-list").first()
                    
                    print(try detailClass?.html())
                    
                    print(try detailClass?.text())
                    
                    
                    
                    let detailFeatures = try detailClass?.text()
                    //print(detailFeatures)
                    //self.detailFeaturesArr = detailFeatures?.components(separatedBy: " ") as! [String]
                    
                    
                    
                } catch {
                    print("err")}
                
                
            }

detailClass?.text()中,我可以获取数据,但是它是一个字符串。在detailClass?.html()中有一个我想从那里获取数据的列表。

列表数据detailClass?.html()

Optional("<li> <strong>Fiyat</strong> <span class=\"price\"> 77.500 TL<input id=\"priceHistoryFlag\" type=\"hidden\" value=\"\" autocomplete=\"off\"> \n  <!-- ngIf: hasPriceHistory --> \n  <!-- ngIf: hasPriceHistory --> </span> </li> \n<li> <strong> İlan Tarihi</strong>&nbsp; <span> 01 Ekim 2020</span> </li> \n<li> <strong>İlan No</strong>&nbsp; <span class=\"classifiedId\" id=\"classifiedId\">865620915</span> </li> \n<li> <strong>Marka</strong>&nbsp; <span>Volvo&nbsp;</span> </li> \n<li> <strong>Seri</strong>&nbsp; <span>S40&nbsp;</span> </li> \n<li> <strong>Model</strong>&nbsp; <span>2.0 T&nbsp;</span> </li> \n<li> <strong>Yıl</strong>&nbsp; <span class=\"\"> 1999</span> </li> \n<li> <strong>Yakıt</strong>&nbsp; <span class=\"\"> Benzin &amp; LPG</span> </li> \n<li> <strong>Vites</strong>&nbsp; <span class=\"\"> Otomatik</span> </li> \n<li> <strong>KM</strong>&nbsp; <span class=\"\"> 178.000</span> </li> \n<li> <strong>Kasa Tipi</strong>&nbsp; <span class=\"\"> Sedan</span> </li> \n<li> <strong>Motor Gücü</strong>&nbsp; <span class=\"\"> 160 hp</span> </li> \n<li> <strong>Motor Hacmi</strong>&nbsp; <span class=\"\"> 1948 cc</span> </li> \n<li> <strong>Çekiş</strong>&nbsp; <span class=\"\"> Önden Çekiş</span> </li> \n<li> <strong>Renk</strong>&nbsp; <span class=\"\"> Gümüş Gri</span> </li> \n<li> <strong>Garanti</strong>&nbsp; <span class=\"\"> Hayır</span> </li> \n<li> <strong>Plaka / Uyruk</strong>&nbsp; <span class=\"\"> Türkiye (TR) Plakalı</span> </li> \n<li> <strong>Kimden</strong>&nbsp; <span class=\"fromOwner\"> Sahibinden</span> </li> \n<li> <strong>Görüntülü Arama İle Görülebilir</strong>&nbsp; <span class=\"\"> Evet</span> </li> \n<li> <strong>Takas</strong>&nbsp; <span> Hayır&nbsp; </span> </li> \n<li> <strong>Durumu</strong>&nbsp; <span> İkinci El&nbsp; </span> </li> \n<li class=\"hiddenAttributes\"> <input type=\"hidden\" autocomplete=\"off\" class=\"classifiedAttr\" id=\"attrClassifiedId\" value=\"865620915\"> <input type=\"hidden\" autocomplete=\"off\" class=\"classifiedAttr\" id=\"attrIsShipping\" value=\"false\"> </li>")

对不起,我的英语。我希望这是可以理解的。

1 个答案:

答案 0 :(得分:0)

我在下面解决了添加代码的问题。我在这里的python问题中找到了答案:How to get a list of the <li> elements in an <ul> with Selenium using Python?

这是我的代码:

                    // BURDA COK FAZLA DATA GELIYOR VE LISTE SEKLINDELER..
                    let detailClass = try doc.getElementsByClass("classified-info-list").first()
                    
                    
                    let listItems = try detailClass?.getElementsByTag("li")
                    for j in try listItems!{
                        let text = try j.text()
                        print(text)
                    }