This is text that I want to scrape
soup = BeautifulSoup(html_text, 'html.parser')
p_tags = soup.find_all('p')[15:24]
for p_tag in p_tags:
for b in p_tags.find_all('b'):
data = b.string
print(data)
上面的代码什么都不返回,但也没有给出错误。需要进行哪些更改?
答案 0 :(得分:2)
要获得所需的列表,您可以使用下一个示例:
import requests
from bs4 import BeautifulSoup
url = "https://www.the-future-of-commerce.com/2020/03/20/brands-with-the-best-customer-service/"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0"
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
h2 = soup.find("h2", text="Top 10 brands with the best customer service")
for row in h2.find_next_siblings(
lambda tag: tag.name == "p"
and [t.name for t in tag.find_all()] == ["b", "span"]
):
b = row.b.get_text(strip=True)
span = row.span.get_text(strip=True)
print("{:<30} {}".format(b, span))
打印:
1. Disney Cruise Line: Service Score –– 9.59 out of 10
2. See’s Candies: Service Score –– 9.38 out of 10
3. Justice: Service Score –– 9.24 out of 10
4. Lands’ End: Service Score –– 9.18 out of 10
5. Chick-fil-a: Service Score –– 9.11 out of 10
6. Publix: Service Score –– 9.07 out of 10
7. Vitacost: Service Score –– 9.04 out of 10
8. Avon: Service Score –– 9.02 out of 10
9. Morton’s The Steakhouse: Service Score –– 9.02 out of 10
10. Cracker Barrel: Service Score –– 9.01 out of 10
或者:
for span in soup.select("b + span"):
if not "Service Score" in span.text:
continue
print(
span.find_previous("b").text, span.text.replace("Service Score –– ", "")
)
打印:
1. Disney Cruise Line: 9.59 out of 10
2. See’s Candies: 9.38 out of 10
3. Justice: 9.24 out of 10
4. Lands’ End: 9.18 out of 10
5. Chick-fil-a: 9.11 out of 10
6. Publix: 9.07 out of 10
7. Vitacost: 9.04 out of 10
8. Avon: 9.02 out of 10
9. Morton’s The Steakhouse: 9.02 out of 10
10. Cracker Barrel: 9.01 out of 10
答案 1 :(得分:0)
第二个循环提取不必要的 b 标签。您有一个 p
标签列表,其中只有一个 b
和一个 span
标签。您只需运行一个循环即可提取所有 p
标签,然后使用 b
提取 span
和 p.find('b')
标签。
这是一个小负载的例子。
from bs4 import BeautifulSoup
soup = BeautifulSoup('<div class="post-single-content selectionShareable"> <p><b>1. Disney Cruise Line:</b><span style="font-weight:400;"> Service Score –– 9.59 out of 10</span></p><p><b>2. See’s Candies: </b><span style="font-weight:400;">Service Score –– 9.38 out of 10</span></p><p><b>3. Justice:</b><span style="font-weight:400;"> Service Score –– 9.24 out of 10</span></p></div>', "html.parser")
p_tags = list(soup.find_all('p'))
for p in p_tags:
b_tags = p.find('b')
span_tags = p.find('span')
b_text = b_tags.getText() if b_tags else ""
span_text = span_tags.getText() if span_tags else ""
print(b_text + span_text)
答案 2 :(得分:0)
from .ui.MultipleChoiceValueWidget_ui import Ui_MultipleChoiceValueWidget
印刷品
for p_tag in (p_tags := soup.find_all(lambda tag: tag.name == "p" and "Service Score" in tag.text)):
print(p_tag.text.replace(" Service Score ––", ""))