考虑以下代码:
divTag = soup.find_all("div", {"class":"classname"})
print divTag
for tag in divTag:
ulTag = soup.find_all("ul", {"class":"classname"})
print ulTag
for tag in ulTag:
liTag = soup.find_all("li", {"class":"classname"})
print liTag
for tag in liTag:
diTag = soup.find_all("div", {"class":"classname"})
print diTag
for tag in diTag:
aTags = tag.find_next("a")
value = aTags.string
print value
仅打印“divTag”& “ulTag”。我确定所有的班级名字都是正确的。 'ul'标签中有大约7个'li'标签,但它不打印任何'li'标签。请帮忙。提前致谢。
更新:
<div class="classname">
<ul auto-load="true" class="classname" data-href="">
<li class="classname">
<div class="classname"><a href="">"value"</a> string <a href="">string1</a> <a class="muted"><abbr class="timeago" title=" 1 Jun, 2015, 10:23 am">7 hours ago</abbr></a>
</div>
</li>
<li>
</li>
</ul>
</div>
我基本上想要在'a'标签中提取“字符串”值。
答案 0 :(得分:0)
每次你在汤里搜索。所以你失败了。您应该在其父标记中搜索标记。 尝试这样的事情:
divTag = soup.find_all("div", {"class":"classname"})
for ulTag in divTag:
for liTag in ulTag.find_all("li", {"class":"classname"}):
for tag in liTag.find_all("div", {"class":"classname"}):
for aTag in tag.find_all('a'):
print aTag.string
对于您提供的html,输出为:
"value"
string1
7 hours ago
答案 1 :(得分:0)
带有next_sibling的完整解决方案
ulTag = soup.find("ul", {"class": "classname"})
aTags = ulTag.find_all("a")
for aTag in aTags:
sibling = aTag.next_sibling
siblingString = str(sibling).strip()
if len(siblingString) > 0:
print siblingString