我有以下原始html:
<h3>Job Description</h3>
<p>We are recruiting part time or full time cashier, to be based at our restaurant at Fraser Place, Jalan Perak.</p>
<p>Work Day: Monday to Friday<br> Work Hour: 7am-5pm (Full time), 9am-2pm (Part Time) or 10am-2pm (Part Time)</p>
<p>Full time rate at RM1600-RM1800 per month depends on experience, part time rate RM7-RM8/ hour depends on experience.</p>
<hr>
<h3>Working Location </h3>
我正在尝试将“Job Descrtion”下的所有文字排除在<hr>
标记之外
我试过了:
for header in soup.find_all('h3'):
para = header.find_next_sibling('p')
但只能设法在“工作取消”之后获得第一个<p>
,并且它不会在<br>
标记内的<p>
标记上运行
答案 0 :(得分:0)
您可以迭代header
兄弟姐妹,直到您匹配hr
。
示例:
example = """<h3>Job Description</h3>
<p>We are recruiting part time or full time cashier, to be based at our
restaurant at Fraser Place, Jalan Perak.</p>
<p>Work Day: Monday to Friday**<br>** Work Hour: 7am-5pm (Full time), 9am-2pm
(Part Time) or 10am-2pm (Part Time)</p>
<p>Full time rate at RM1600-RM1800 per month depends on experience, part time
rate RM7-RM8/ hour depends on experience.</p>
<hr>
<h3>Working Location </h3>"""
soup = BeautifulSoup(example, 'html.parser')
for header in soup.find_all('h3'):
nextNode = header
while True:
nextNode = nextNode.nextSibling
if nextNode is None:
break
if nextNode.name is not None:
if nextNode.name == "hr":
break
print (nextNode.get_text(strip=True))
输出:
We are recruiting part time or full time cashier, to be based at our
restaurant at Fraser Place, Jalan Perak.
Work Day: Monday to Friday**** Work Hour: 7am-5pm (Full time), 9am-2pm (Part
Time) or 10am-2pm (Part Time)
Full time rate at RM1600-RM1800 per month depends on experience, part time
rate RM7-RM8/ hour depends on experience.