我要抓取的表中存在循环
<ul>
<li class="cell036 tal arrow"><a href=" y/">ALdCTL</a></li>
<li class="cell009">5,71</li>
<li class="cell009">5,74</li>
<li class="cell009">-3,04</li>
<li class="cell009">5,92</li>
<li class="cell009">5,76</li>
<li class="cell009">5,53</li>
<li class="cell009">907.438</li>
<li class="cell009">5.114.192</li>
</ul>
我的python代码可以在a
类li
的第一个元素中查找文本,而不能在cell009
类的第一个元素中查找文本
c=soup.findAll('li',class_='cell036 tal arrow' )
for foo in soup.find_all('li', class_= ['cell036 tal arrow']):
bar = foo.find(['a'])
print(bar.text)
答案 0 :(得分:1)
要抓取所有值,您只需要获取所有li
标记(而不必限制搜索类为cell036 tal arrow
的元素,这就是为什么您只能获取该值):
尝试一下:
from bs4 import BeautifulSoup
html_text = """
<ul>
<li class="cell036 tal arrow"><a href=" y/">ALdCTL</a></li>
<li class="cell009">5,71</li>
<li class="cell009">5,74</li>
<li class="cell009">-3,04</li>
<li class="cell009">5,92</li>
<li class="cell009">5,76</li>
<li class="cell009">5,53</li>
<li class="cell009">907.438</li>
<li class="cell009">5.114.192</li>
</ul>
"""
soup = BeautifulSoup(html_text, "lxml")
for foo in soup.find_all('li'):
print(foo.text)
输出:
ALdCTL
5,71
5,74
-3,04
5,92
5,76
5,53
907.438
5.114.192
答案 1 :(得分:1)
借用drec4s的开放结构,您也许还可以使用CSS或组合以类名作为目标li元素。
from bs4 import BeautifulSoup
html_text = """
<ul>
<li class="cell036 tal arrow"><a href=" y/">ALdCTL</a></li>
<li class="cell009">5,71</li>
<li class="cell009">5,74</li>
<li class="cell009">-3,04</li>
<li class="cell009">5,92</li>
<li class="cell009">5,76</li>
<li class="cell009">5,53</li>
<li class="cell009">907.438</li>
<li class="cell009">5.114.192</li>
</ul>
"""
soup = BeautifulSoup(html_text, "lxml")
for foo in soup.select('li.cell036.tal.arrow,li.cell009'):
print(foo.text)
答案 2 :(得分:0)
您要查找的li
内部 内不包含其他li
元素。他们是兄弟姐妹。使用find_next_siblings
:
content = """
<ul>
<li class="cell036 tal arrow"><a href=" y/">ALdCTL</a></li>
<li class="cell009">5,71</li>
<li class="cell009">5,74</li>
<li class="cell009">-3,04</li>
<li class="cell009">5,92</li>
<li class="cell009">5,76</li>
<li class="cell009">5,53</li>
<li class="cell009">907.438</li>
<li class="cell009">5.114.192</li>
</ul>
"""
soup = bs4.BeautifulSoup(content)
header = soup.findAll("li", class_="cell036 tal arrow")
header[0].find_next_siblings("li")
赠予:
[<li class="cell009">5,71</li>,
<li class="cell009">5,74</li>,
<li class="cell009">-3,04</li>,
<li class="cell009">5,92</li>,
<li class="cell009">5,76</li>,
<li class="cell009">5,53</li>,
<li class="cell009">907.438</li>,
<li class="cell009">5.114.192</li>]