美丽的汤页循环

时间:2017-09-25 09:50:47

标签: python beautifulsoup

我第一次开始做Python。我尝试从几个住宿页面中取出数据。不过,我无法弄清楚我是如何使用相同的URL循环下一页的数据。如果你能帮助我,请提前谢谢。这是我的鳕鱼

     from bs4 import BeautifulSoup as soup 
     from urllib.request import urlopen as uReq
     import re
       my_url='https://www.hipflat.co.th/en/search/sale/condo_y/any_r1/any_r2/any_p/any         _b/any_a/any_w/any_o/any_i/100.62442610451406,13.77183154691727_c/12_z/list_v'
     https://www.hipflat.co.th/en/search/sale/condo_y/any_r1/any_r2/any_p/any_b/any_a/any_w/any_o/any_i/100.62442610451406,13.77183154691727_c/12_z/list_v
     # grabbing the page
     uClient = uReq(my_url)
     page_html = uClient.read()
     uClient.close()
     #html parsing
     page_soup = soup(page_html,"html.parser")

     #Grab Data
     Condo = page_soup.findAll("li",{"class":"listing"})
         for Con in Condo:
             Name = Con.findAll("div",{"class":"listing-project"})[0].text.strip()
             Description = Con.p.text
             Price = Con.findAll("div",{"class":"listing-price"})[0].text.strip()
             Type = Con.findAll("ul",{"class":"listing-detail"})[0].text.replace("\n","")
             Type = re.split(r'(bed|bath|m2)',Type)
             Data = {"Name" : Name,"Description" : Description,"Price":Price,"Bed":Type[0],"Bath":Type[2],"Area(m2)":Type[4]}

此外,如果我编辑字典数据。我也得到了IndexError:list index超出范围

  Data = {"Name" : Name,"Description" : Description,"Price":Price,"Bed":Type[0],"Bath":Type[2],"Area(m2)":Type[4],"Floor":Type[6]}

0 个答案:

没有答案