Question

我目前正在为不同的网站编写价格跟踪器，但遇到了一个问题。我正在尝试使用BeautifulSoup4刮擦h1标签的内容，但是我不知道如何。我尝试使用字典，如 https://stackoverflow.com/a/40716482/14003061，但返回了None。有人可以帮忙吗？不胜感激！

这是代码：

from termcolor import colored
import requests
from bs4 import BeautifulSoup
import smtplib

def choice_bwfo():
    print(colored("You have selected Buy Whole Foods Online [BWFO]", "blue"))
    url = input(colored("\n[ 2 ] Paste a product link from BWFO.\n", "magenta"))
    url_verify = requests.get(url, headers=headers)
    soup = BeautifulSoup(url_verify.content, 'html5lib')

    item_block = BeautifulSoup.find('h1', {'itemprop' : 'name'})
    print(item_block)

choice_bwfo()

这是您可以使用的示例网址：

https://www.buywholefoodsonline.co.uk/organic-spanish-bee-pollen-250g.html

谢谢：）

Answer 1

此脚本将打印<h1>标签的内容：

import requests
from bs4 import BeautifulSoup


url = 'https://www.buywholefoodsonline.co.uk/organic-spanish-bee-pollen-250g.html'

# create `soup` variable from the URL:
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

# print text of first `<h1>` tag:
print(soup.h1.get_text())

打印：

Organic Spanish Bee Pollen 250g

或者您可以这样做：

print(soup.find('h1', {'itemprop' : 'name'}).get_text())

如何使用BeautifulSoup抓取<h1>标签？ [蟒蛇]

1 个答案: