如何使用Beautiful Soup提取<span>标签内容?

时间:2017-07-25 14:00:27

标签: python python-3.x web-scraping beautifulsoup python-requests

我试图从谷歌翻译网站中提取span标签内容。内容是翻译后的结果,其id =&#34; result_box&#34;。 当试图打印内容时,它返回None值。

请查看图片here

import requests
from bs4 import BeautifulSoup

r = requests.get("https://translate.google.co.in/?rlz=1C1CHZL_enIN729IN729&um=1&ie=UTF-8&hl=en&client=tw-ob#en/fr/good%20morning")

soup = BeautifulSoup(r.content, "lxml")
spanner = soup.find(id = "result_box")

result = spanner.text

1 个答案:

答案 0 :(得分:2)

请求不会执行JavaScript,您可以使用 selenium PhantomJS 进行无头浏览,如下所示:

from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://translate.google.co.in/?rlz=1C1CHZL_enIN729IN729&um=1&ie=UTF-8&hl=en&client=tw-ob#en/fr/good%20morning"
browser = webdriver.PhantomJS()
browser.get(url)
html = browser.page_source

soup = BeautifulSoup(html, 'lxml')
spanner = soup.find(id = "result_box")
result = spanner.text

这给出了我们预期的结果:

>>> result
'Bonjour'