我的网页包含类似(下方)
的数据<span class="results_summary"><span class="label">Thesis Note: </span>For the degree of Executive Master in Consulting and Coaching for Change, XXXX, February 2018</span>
<span class="results_summary"><span class="label">Bibliography/Index: </span>Includes bibliographical references</span><span class="results_summary"><span class="label">Abstract: </span>In today’s “attention economy”, self-awareness, ability to regulate one’s emotions, having the
negative capability, improved focus and clarity of mind for better decision making stand out
as crucial traits for effective leadership.
Despite the scientific findings re-affirming the positive impact of the regular practice of
mindfulness meditation on effectiveness, take-up rate of the concept for formal leadership &
talent development programs has been slow. What’s novel in this study is to experiment and
explore possible underlying reasons for that and articulate on the viability of mindfulness rollout
programs in leadership development context.
</span>
问题是所有标签都包含span class =&#34; results_summary&#34;和span class =&#34; label&#34;反复。我需要在&#34; Abstract&#34;下提取大段。我只是尝试了以下但无法超越。
t=soup.findAll('span',{'class':'label'})
输出:
<span class="label">Thesis Note: </span>
<span class="label">Bibliography/Index: </span>
<span class="label">Abstract: </span>
答案 0 :(得分:1)
您可以使用.next_sibling
<强>实施例强>
html = """<span class="results_summary"><span class="label">Thesis Note: </span>For the degree of Executive Master in Consulting and Coaching for Change, XXXX, February 2018</span>
<span class="results_summary"><span class="label">Bibliography/Index: </span>Includes bibliographical references</span><span class="results_summary"><span class="label">Abstract: </span>In today’s “attention economy”, self-awareness, ability to regulate one’s emotions, having the
negative capability, improved focus and clarity of mind for better decision making stand out
as crucial traits for effective leadership.
Despite the scientific findings re-affirming the positive impact of the regular practice of
mindfulness meditation on effectiveness, take-up rate of the concept for formal leadership &
talent development programs has been slow. What’s novel in this study is to experiment and
explore possible underlying reasons for that and articulate on the viability of mindfulness rollout
programs in leadership development context.
</span>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
for span in soup.findAll('span',{'class':'label'}):
if "Abstract:" in span.text:
print(span.next_sibling )
答案 1 :(得分:1)
您可以使用正则表达式而不是使用美丽的汤。
import re
result = re.findall(r'<span class="label">Abstract: </span>(.[\s\S]*)</span>',html_text)
假设<span class="label">Abstract: </span>
html_text
在A connection was successfully established with the server, but then an error
occurred during the pre-login handshake. (provider: TCP Provider,
error: 0 - An existing connection was forcibly closed by the remote host.)
中是唯一的,如果不是这样,请找一个检索所需数据的唯一模式。
答案 2 :(得分:1)
val = """<span class="results_summary"><span class="label">Thesis Note: </span>For the degree of Executive Master in Consulting and Coaching for Change, XXXX, February 2018</span> \n<span class="results_summary"><span class="label">Bibliography/Index: </span>Includes bibliographical references</span><span class="results_summary"><span class="label">Abstract: </span>In todays attention economy, self-awareness, ability to regulate ones emotions, having the\n negative capability, improved focus and clarity of mind for better decision making stand out\n as crucial traits for effective leadership.\n Despite the scientific findings re-affirming the positive impact of the regular practice of\n mindfulness meditation on effectiveness, take-up rate of the concept for formal leadership &\n talent development programs has been slow. Whats novel in this study is to experiment and\n explore possible underlying reasons for that and articulate on the viability of mindfulness rollout\n programs in leadership development context.\n</span>"""
soup = BeautifulSoup(val, "html5lib").findAll('span', {'class': 'results_summary'})[2]
for span in soup.findAll('span'):
span.unwrap()
print(soup.decode_contents())
将从span中删除额外的标记,然后将内容编码为Python字符串
"""Abstract: In todays attention economy, self-awareness, ability to regulate ones emotions, having the
negative capability, improved focus and clarity of mind for better decision making stand out
as crucial traits for effective leadership.
Despite the scientific findings re-affirming the positive impact of the regular practice of
mindfulness meditation on effectiveness, take-up rate of the concept for formal leadership &
talent development programs has been slow. Whats novel in this study is to experiment and
explore possible underlying reasons for that and articulate on the viability of mindfulness rollout
programs in leadership development context."""