循环使用列表进行网页抓取时出现问题

时间:2020-05-11 16:42:43

标签: python function selenium

该代码可以正常工作,但是当我添加未注册其域名的网站时,该代码将停止工作。示例:sincoban.com.br。 这个想法是您可以为未注册的域填写一些值。有什么办法解决这个问题?

#Script que coleta todas as informações dos domínios ".br"
sites = []
site = {}

domains = ['terra.com.br','oi.com.br','unidas.com.br','sincoban.com.br']

#scrape elements
ff = webdriver.Firefox(executable_path="D:/Programas/gecko/geckodriver.exe")

for domain in domains:

    site = {}

    ff.get('https://www.whois.com/whois/'+ domain)
    html = ff.page_source
    soup = BeautifulSoup(html,'html.parser')

    #Tags de interesse
    list_ = soup.find('div', {'class':'df-block'})
    h = soup.find('div', {'class':'df-block'})

   #names web sites 
    try:
        names = list_
    except:
        names = ""

    names = list_
    registro = []
    for name in names:
        registro.append(name.text.split()[51])
        site['DomainInformation'] = registro
        #print(name)


    #DNS hosting
    try:
        registers = list_
    except:
        registers = ""

    registers = list_
    status = []

    try:
        element = h.text.split().index('published')

    except:
        element = ""

    element = h.text.split().index('published') #elemento de pesquisa
    for register in registers:
        status.append(register.text.split()[element]) #Passa o parâmetro pesquisado
        site['status'] = status
        #print(name)


    #List web sites
    sites.append(site)

enter image description here

0 个答案:

没有答案