Question

我只是想获取https://www.daraz.com.pk网站的搜索栏的html。我已经编写了一个代码，并在“ https://www.amazon.com”，“ https://www.alibaba.com”，“ https://www.goto.com.pk”等上进行了尝试，效果很好。但不适用于https://www.daraz.com.pk。

    from urllib.request import urlopen
    from bs4 import BeautifulSoup
    from urllib import request
    import ssl
    import requests

    ssl._create_default_https_context = ssl._create_unverified_context

    html =  urlopen("https://www.daraz.com.pk")
    bsObj = BeautifulSoup(html, features="lxml")
    nameList = bsObj.find("input", {"type": "search"})
    print(nameList)

它返回无，而是返回：

input type="search" id="q" name="q" placeholder="Search in Daraz" class="search-box__input--O34g" tabindex="1" value="" data-spm-anchor-id="a2a0e.home.search.i0.35e34937eWCmbI"

我还曾在亚马逊，阿里巴巴和其他一些网站上尝试过类似的代码，这些代码成功返回了它们的html：

     html =  urlopen("https://www.amazon.com")
    bsObj = BeautifulSoup(html, features="lxml")
    nameList = bsObj.find("input", {"type": "text"})
    print(nameList)

我也尝试过这种方式：

    bsObj=BeautifulSoup(requests.get("https://www.daraz.com.pk").content, 
    "html.parser")

    nameList = bsObj.find("input", {"type": "search"})
    print(nameList)

以这种方式使用硒：

    driver = webdriver.Firefox()
    driver.get("https://www.daraz.com.pk")

    time.sleep(2)
    content = driver.page_source.encode('utf-8').strip()
    soup = BeautifulSoup(content,"html.parser")
    time.sleep(2)
    officials = soup.find("input", {"type":"search"})
    print(str(officials))

但失败了。

Python（美丽汤）在爬网时对现有html返回“ none”

0 个答案: