好的,我试图得到的数据看起来像这样;
<li class="expandable"> Criminal
<ul class="subPracticeAreas" style="display:none">
<li> Appellate< /li>
<li>Crimes against the person</li>
<li> Drugs< /li>
<li>Environmental and planning offences</li>
<li> Extradition< /li>
<li>Fraud</li>
<li> Juvenile justice</li>
<li>Mental illness</li>
<li> Proceeds of crime / money laundering</li>
<li>Property offences</li>
<li> Sexual assault</li>
<li>Traffic</li>
<li> White collar and corporate crime</li>
<li>Work health and safety</li>
</ul>
</li>
<li class="expandable"> Appellate
<ul class="subPracticeAreas" style="display:none">
<li> Civil appeals</li>
<li>Criminal appeals</li>
</ul>
</li>
<li class="expandable"> Inquests / inquiries
<ul class="subPracticeAreas" style="display:none">
<li> Commissions and other Inquiries</li>
<li>Coronial inquests</li>
</ul>
</li>
所以我希望能够实现这些目标;
每个li类的冲洗和重复过程=&#34;可扩展&#34;部分。
到目前为止我所做的事情(正如你想象的那样无效);
aop_list_headers = page_soup.findAll("li",{"class":"expandable"})
for aop_list in aop_list_headers:
aop_key_name = aop_li_head.getText().strip()
因此,这将返回相应父li的所有文本(例如,对于上述循环的第一次迭代,我得到以下内容;
CriminalAppellateCrimes against the personDrugsEnvironmental and planning offencesExtraditionFraudJuvenile justiceMental illnessProceeds of crime/money launderingProperty offencesSexual assaultTrafficWhite collar and corporate crimeWork health and safety
我如何阻止这一点通过每篇文章(因为我看到它正在发生,因为父母李绕着整个列表...
我没有包括我将如何实现第二个目标(如上所述),因为我坚持第一个目标......
非常感谢所有帮助。先谢谢你。
答案 0 :(得分:1)
您可以使用递归标记通过find_all
访问预期dict密钥的所有子元素:
children = soup.find_all("li", { "class" : "expandable" }, recursive=False)
for child in children:
print child.getText()
或者,您可以获取其父(ul)的父级具有“可扩展”类的所有li
文本元素
def get_children(elem):
return (tag.name == 'li' and
tag.parent.parent.name == 'li' and
'expandable' in tag.parent.parent['class'])
for child in soup.find_all(get_children):
print child.getText() #li text
答案 1 :(得分:1)
我最终在BeautifulSoup中使用了extend()函数,就像这样;
[_textfield becomeFirstResponder];
[_textfield addTarget:self action:@selector(backAction:)
forControlEvents:UIControlEventEditingDidEndOnExit];
[_back addTarget:self action:@selector(backAction:)
forControlEvents:UIControlEventTouchUpInside];
- (void)backAction:(id)sender
{
[users addObject:_textfield.text];
_textfield.text = nil;
[_textfield becomeFirstResponder];
}
因此转过来;
for html in html_list:
# Storing the unwanted child element
unwanted = html.find("ul",{"class":""subPracticeAreas""})
# Extracting the child <ul> data
unwanted.extract()
进入这个;
<li class="expandable"> Criminal
<ul class="subPracticeAreas" style="display:none">
<li> Appellate< /li>
<li>Crimes against the person</li>
<li> Drugs< /li>
<li>Environmental and planning offences</li>
<li> Extradition< /li>
<li>Fraud</li>
<li> Juvenile justice</li>
<li>Mental illness</li>
<li> Proceeds of crime / money laundering</li>
<li>Property offences</li>
<li> Sexual assault</li>
<li>Traffic</li>
<li> White collar and corporate crime</li>
<li>Work health and safety</li>
</ul>
</li>
因此请留下我需要收集的父
要完成原始评论中提到的两项任务,我使用了以下代码。
<li class="expandable"> Criminal </li>
感谢大家的投入!
干杯