我是BS4和网络爬虫的新手,所以对于这样的基本问题,我们事先表示歉意。
我正在抓捕Beer Advocate网站(https://www.beeradvocate.com/beer/?view=recent),但我不知道如何获取ABV内容,主要是因为我不确定我应该使用哪个标签。根据HTML工具,标记为#text,但是我不确定如何处理。
有人知道如何提取此信息吗?
谢谢。
答案 0 :(得分:0)
要获取酒精含量和啤酒品牌,您可以使用以下示例:
import re
from bs4 import BeautifulSoup
import requests
url = 'https://www.beeradvocate.com/beer/?view=recent'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
r = re.compile(r'([\d.]+)% ABV$')
for t in soup.find_all(text=r):
name = t.find_previous('h6').text
amount = r.search(t).group(1)
print('{:<50} {}%'.format(name, amount))
打印:
Granát (BrouCzech Dark) 5%
HopTime Harvest Ale 6%
Direct Current 6.8%
Hopzilla Double IPA 8.7%
Dankful 7.4%
Cancun Commie 11.5%
Welcome Young One 8.2%
Lick the Spoon 12%
Split Open and Melt 8%
Speedway Stout 12%
What Mask? 8.4%
Switch Lanes 7%
Hella Juice Bag 8.2%
Down By The River 4.9%
Road Town 7.5%
Manhattan Social Club 12.5%
Flash Kick 8.2%
Naked Brunch 8.5%
Tiki Breeze 7%
Oberon - Mango 5.8%
Eldest Brother 11%
Bliss 8%
Watou Tripel 7.5%
Respect Your Elders 7.25%
Braxton Labs Smoothie Sour: Tropical 4.8%
Heaven Scent 5.5%
Oktoberfest 6.5%
Phaser 6.5%
Mark It Zero! 12%
Lake George IPA 6.8%
Triangled IPA (⟁) 8%
Broo Doo 7%
Porter 6.5%
Imperial Porter - Rum Barrel Aged w/ Coconut 7.2%
Willow 7.1%
State of the Art - Orange DIPA 8.7%
Fest-Beer 5.9%
Boskeun 10%
Smuttlabs Baja Hoodie 8.4%
Trappist Achel 8° Bruin 8%
Double Dry Hopped Double Mosaic Dream 8.5%
Falcon Smash 7.4%
Hazy Wonder 6%
Mango Wango 7.5%
North Park 5%
The Tomb 10.2%
Cashmere Hammer 6.5%
Chonk Sundae Sour (Peanut Butter and Jelly) 4.3%
The Tearing Of Flesh From Bone 8.2%
Oktoberfest 6.1%
答案 1 :(得分:0)
在这里,您可以使用bs4查找文本,然后使用正则表达式提取所有ABV匹配字符串。
from bs4 import BeautifulSoup
import re
webpage = "YOUR_WEBPAGE_STRING"
soup = BeautifulSoup(webpage, features="html.parser")
txt = soup.text
x = re.findall("^| \d+% ABV", txt)
print(x)
对于给定的链接,您将获得如下输出:
['', ' 5% ABV', ' 6% ABV', ' 12% ABV', ' 8% ABV', ' 12% ABV', ' 7% ABV', ' 7% ABV', ' 11% ABV', ' 8% ABV', ' 12% ABV', ' 8% ABV', ' 7% ABV', ' 10% ABV', ' 8% ABV', ' 6% ABV', ' 5% ABV']