如何使用$onInit() {
console.log(this.projectPagination, this.consultantPagination,
this.newsletterPagination);
this.updateList(
this.projectPagination,
this.consultantPagination,
this.newsletterPagination
);
}
选择未包含在代码中的所有div.title
的所有第一个兄弟?
在下面的示例中,我需要检索:
beautifulsoup
示例
[Text I care about which <b>can</b> have formatting...,
Text I care about.,
Text I care about <span class='someclass'>which can be in a span</span>...]
请注意,我需要使用一些正则表达式修改特定位置的文本。因此,我需要包含格式标记的整个文本(<div class="level1">
<div class="title">
Title I do not care about
</div>
<div class="level2">
<div class="title">
Title I do not care about
</div>
Text I care about which <b>can</b> have formatting...
</div>
<div class="level2">
<div class="title">
Title I do not care about
</div>
<div class="level3">
<div class="title">
Title I do not care about
</div>
Text I care about.
</div>
<div class="level3">
<div class="title">
Title I do not care about
</div>
Text I care about <span class='someclass'>which can be in a span</span>...
</div>
</div>
</div>
,b
,br
等。)
答案 0 :(得分:0)
您可以使用bs4 extract()
方法从find_all
结果项中删除不需要的代码。
例如:
import bs4
soup = bs4.BeautifulSoup(texthere)
divs = soup.find_all("div", {"class":"level3"}) #Finds all divs
for div in divs:
title = div.find("div", {"class":"title"}) #Finds the title within each div
title.extract() #Remove that title from the div
print(div.text) #Here I print the div.text, but you can repurpose this for whatever you need
以下是SO的良好来源:Exclude unwanted tag on Beautifulsoup Python
希望它有所帮助!
答案 1 :(得分:0)
`from bs4 import BeautifulSoup;
strn ="""
<div class="level1">
<div class="title">
Title I do not care about
</div>
<div class="level2">
<div class="title">
Title I do not care about
</div>
Text I care about which <b>can</b> have formatting...
</div>
<div class="level2">
<div class="title">
Title I do not care about
</div>
<div class="level3">
<div class="title">
Title I do not care about
</div>
Text I care about.
</div>
<div class="level3">
<div class="title">
Title I do not care about
</div>
Text I care about <span class='someclass'>which can be in a span</span>...
</div>
</div>
</div> """
soup = BeautifulSoup(strn, 'html.parser')
the_divs= soup.find_all('div', class_='title')
for the_div in the_divs:
for the_sibling in the_div.parent.contents:
if the_sibling.name != 'div':
print the_sibling.string
`
使用&#39; the_sibling&#39;变量在这里形成一个你需要的字符串,例如&#39; STR(the_sibling)&#39;会返回包含标签的文字(您的或
)