import requests
from bs4 import BeautifulSoup
URL="https://kissanime.to"
page = requests.get(URL)
Code = BeautifulSoup(page.content,"html.parser")
print Code.title
这是输出
<title>Please wait 5 seconds...</title>
每次我从这个网站请求这是我唯一得到的。有没有办法绕过这个并从实际网站获取HTML代码?
我想得到:
<title>KissAnime - Watch anime online in high quality</title>
答案 0 :(得分:1)
此特定网站非常动态,需要加载真实浏览器。让我们通过PhantomJS
headless browser WebDriver控制selenium
,加载页面并{{3标题不相等“请等待5秒......”:
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
driver = webdriver.PhantomJS()
driver.get("https://kissanime.to")
# wait for title not be equal to "Please wait 5 seconds..."
wait = WebDriverWait(driver, 10)
wait.until(lambda driver: driver.title != "Please wait 5 seconds...")
print(driver.title)
打印:
KissAnime - Watch anime online in high quality