我想使用python和美丽的汤从下面的标签中提取1626 我试过这个答案Accessing untagged text using beautifulsoup 但我得到的只是一个空数组[]
<div class="columns">
<h1 style="line-height: .85em; margin-top: 0" class="panel-border text-primary strong">
Laundry Dry Cleaning Equipment
<br>
<br>
</h1>
1626 Total Items
<!-- br-->
<div>...</div>
</div>
我该如何提取数字?
答案 0 :(得分:0)
您可以循环使用html代码并使用正则表达式找到所需内容
import bs4, re
page = """
<div class="columns">
<h1 style="line-height: .85em; margin-top: 0" class="panel-border text-primary strong">
Laundry Dry Cleaning Equipment
<br>
<br>
</h1>
1626 Total Items
5526 Total Items
4426 Total Items
<!-- br-->
<div>...</div>
</div>"""
soup = bs4.BeautifulSoup(page, 'lxml')
divs = soup.findAll('div', {'class' : 'columns'})
div= divs[0] # we only have one div
divtext= str(div).split('\n') # get div html code and split it's lines
for line in divtext:
line = line.strip()
# match wanted pattern
match = re.match(r'^(\d+)\s*Total Items$', line)
if match is not None: #if match found
print(match.group(1)) # extract the number
答案 1 :(得分:0)
我尝试使用您在上述问题中附加的此link中使用的相同约定。
希望这就是你要找的东西。
代码:
data = """
<div class="columns">
<h1 style="line-height: .85em; margin-top: 0" class="panel-border text-primary strong">
Laundry Dry Cleaning Equipment
<br>
<br>
</h1>
1626 Total Items
<!-- br-->
<div>...</div>
</div>
"""
soup = BeautifulSoup(data, 'html.parser')
for i in soup.find_all(text=True, recursive=True):
if "Total Items" in i:
print(str(i).replace(' ', '').replace('TotalItems', ''))
输出:
1626