如何解析返回不同HTML的网站<title>请等待5秒...... </title>

时间:2016-02-04 21:11:15

标签: python python-2.7 web-scraping request

import requests
from bs4 import BeautifulSoup

URL="https://kissanime.to"
page = requests.get(URL)

Code = BeautifulSoup(page.content,"html.parser")
print Code.title

这是输出

<title>Please wait 5 seconds...</title>

每次我从这个网站请求这是我唯一得到的。有没有办法绕过这个并从实际网站获取HTML代码?

我想得到:

<title>KissAnime - Watch anime online in high quality</title>

1 个答案:

答案 0 :(得分:1)

此特定网站非常动态,需要加载真实浏览器。让我们通过PhantomJS headless browser WebDriver控制selenium,加载页面并{{3标题不相等“请等待5秒......”:

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait

driver = webdriver.PhantomJS()
driver.get("https://kissanime.to")

# wait for title not be equal to "Please wait 5 seconds..."
wait = WebDriverWait(driver, 10)
wait.until(lambda driver: driver.title != "Please wait 5 seconds...")

print(driver.title)

打印:

KissAnime - Watch anime online in high quality