from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("http://www.bbc.co.uk/iplayer/live/bbcone?area=london")
bsObj = BeautifulSoup(html, "html.parser")
version = bsObj.find(string = re.compile('DOCTYPE html'))
if version in bsObj:
print("Yes")
else:
print("No")
我知道" http://www.bbc.co.uk/iplayer/live/bbcone?area=london"的doctype声明是html 5(!DOCTYPE html),但是当我运行这个脚本时,输出是" No"。我做错了什么?
答案 0 :(得分:0)
Doctype是对浏览器的指令,因此find和find_all无法正常查找,因为它不是html标记。
除此之外,您的正则表达式无法正常工作,因为BS中的string
值仅为html
而不是DOCTYPE html
。
您可以使用用户kindall
提及的链接或以这种方式使用它:
import requests
from bs4 import BeautifulSoup, Doctype
html = requests.get("http://www.bbc.co.uk/iplayer/live/bbcone?area=london")
soup = BeautifulSoup(html.content, "html.parser")
version = soup.find_all(string="html")
DOCTYPE = next(item for item in version if isinstance(item, Doctype))
print (DOCTYPE)
将打印:
HTML