我在尝试使用正则表达式和美丽的汤时遇到了一些麻烦。
我的HTML如下:
[<strong>See the full calendar</strong>, <strong>See all events</strong>, <strong>See all committee meetings</strong>, <strong>526 spaces</strong>, <strong>89 spaces</strong>, <strong>53 spaces</strong>, <strong>154 spaces</strong>, <strong>194 spaces</strong>, <strong>See all news releases</strong>]
[<strong>See the full calendar</strong>, <strong>See all events</strong>, <strong>See all committee meetings</strong>, <strong>526 spaces</strong>, <strong>89 spaces</strong>, <strong>53 spaces</strong>, <strong>154 spaces</strong>, <strong>194 spaces</strong>, <strong>See all news releases</strong>]
我想要的只是强标签之间的空格数。
我尝试过使用:
print(soup.find_all(re.compile("\d\d\d\s[a-zA-Z]{6}|(strong)")))
但是,这会返回print(soup.find_all('strong'))
所做的所有事情。
有谁知道我哪里出错了?
答案 0 :(得分:2)
如果我理解正确,您可以使用text
的{{1}}属性,并传递已编译的正则表达式模式:
soup.find_all
输出:
import re
spaces = []
for tag in content.find_all(text=re.compile("\d+(?= spaces)")):
spaces.append(int(tag.string.split()[0]))
print(spaces)
答案 1 :(得分:1)
首先找到所有强标签
strong_tags = soup.find_all('strong')
spaces_in_tags = {}
# Afterwards iterate over the tags.. Then do either
for strong in strong_tags:
# 1. (EDIT add \s+ so multiple spaces between words will count as 1 space)
number_of_spaces = len(re.findall('\s+', strong))
# 2.
number_of_spaces2 = len(strong.split())-1
# Then add them do a dictionary/list whatever suits your need
# For example to have the string as the key parameter in a dictionary
spaces_in_tags[strong] = number_of_spaces