我只是想获取https://www.daraz.com.pk网站的搜索栏的html。我已经编写了一个代码,并在“ https://www.amazon.com”,“ https://www.alibaba.com”,“ https://www.goto.com.pk”等上进行了尝试,效果很好。但不适用于https://www.daraz.com.pk。
from urllib.request import urlopen
from bs4 import BeautifulSoup
from urllib import request
import ssl
import requests
ssl._create_default_https_context = ssl._create_unverified_context
html = urlopen("https://www.daraz.com.pk")
bsObj = BeautifulSoup(html, features="lxml")
nameList = bsObj.find("input", {"type": "search"})
print(nameList)
它返回无,而是返回:
input type="search" id="q" name="q" placeholder="Search in Daraz" class="search-box__input--O34g" tabindex="1" value="" data-spm-anchor-id="a2a0e.home.search.i0.35e34937eWCmbI"
我还曾在亚马逊,阿里巴巴和其他一些网站上尝试过类似的代码,这些代码成功返回了它们的html:
html = urlopen("https://www.amazon.com")
bsObj = BeautifulSoup(html, features="lxml")
nameList = bsObj.find("input", {"type": "text"})
print(nameList)
我也尝试过这种方式:
bsObj=BeautifulSoup(requests.get("https://www.daraz.com.pk").content,
"html.parser")
nameList = bsObj.find("input", {"type": "search"})
print(nameList)
以这种方式使用硒:
driver = webdriver.Firefox()
driver.get("https://www.daraz.com.pk")
time.sleep(2)
content = driver.page_source.encode('utf-8').strip()
soup = BeautifulSoup(content,"html.parser")
time.sleep(2)
officials = soup.find("input", {"type":"search"})
print(str(officials))
但失败了。