我第一次使用Beautiful Soup,我正在尝试获取网页中特定元素的值。
例如,在此代码段中:
<div class="otg-vendor-name"><a class="otg-vendor-name-link" href="http://www.3brotherskitchen.com" target="_blank">3 Brothers Kitchen</a></div>
我希望从标签中获得“3兄弟厨房”。
到目前为止,我尝试了一些似乎不起作用的东西:
import urllib2
from bs4 import BeautifulSoup
url = "http://someurl"
def get_all_vendors():
try:
web_page = urllib2.urlopen(url).read()
soup = BeautifulSoup(web_page)
c = []
c.append(soup.findAll("div", {"class":'otg-vendor-name'}).contents)
print c
except urllib2.HTTPError:
print("HTTPERROR!")
except urllib2.URLError:
print("URLERROR!")
return c
答案 0 :(得分:0)
您可以通过CSS selector
:
soup.select('div.otg-vendor-name > a.otg-vendor-name-link')[0].text
或者,通过find()
:
soup.find('div', class_='otg-vendor-name').find('a', class_='otg-vendor-name-link').text
更新(使用requests
并提供User-Agent
标题):
from bs4 import BeautifulSoup
import requests
url = 'http://offthegridsf.com/vendors#food'
with requests.Session() as session:
session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36'}
session.get(url)
response = session.get(url)
soup = BeautifulSoup(response.content)
print soup.select('div.otg-vendor-name > a.otg-vendor-name-link')[0].text
print soup.find('div', class_='otg-vendor-name').find('a', class_='otg-vendor-name-link').text