我想阅读页面https://www1.hkexnews.hk/listedco/listconews/index/lci.html?lang=zh
。这是我的代码:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
url = 'https://www1.hkexnews.hk/listedco/listconews/index/lci.html?lang=zh'
res = requests.get(url, headers = headers)
res.encoding = 'utf-8-sig'
soup = BeautifulSoup(res.text, 'lxml')
但是,res.text
不包含页面数据。
我也尝试过:
from requests_html import HTMLSession
session = HTMLSession()
r = session.get(url)
r.html.render()
它说:pyppeteer.errors.NetworkError: Protocol error Target.closeTarget: Target closed.
我该怎么办?
答案 0 :(得分:1)
答案 1 :(得分:0)
您的代码正确。尝试加载其他页面。我运行了脚本,它可以正常工作。
import requests
from bs4 import BeautifulSoup # You missed a character 'l'
url = "https://www1.hkexnews.hk/listedco/listconews/index/lci.html?lang=zh"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
response = requests.get(url, headers=headers)
response.encoding = 'utf-8-sig'
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'lxml')
els = soup.select("#Callable\ Bull\/Bear\ Contracts")
print(els[0])
我知道了
<input checked="" class="filterCheckBox strcProdCheckBox" data-value="Callable Bull/Bear Contracts" id="Callable Bull/Bear Contracts" name="Property" tabindex="-1" type="checkbox"/>
尝试:
curl --header "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Geckoe/50.0.2661.102 Safari/537.36" https://www1.hkexnews.hk/listedco/listconews/index/lci.html?lang=zh