我正在使用此代码浏览p标签列表,该列表比在我的示例中包含1个或多个span标签的列表长得多。我知道列表中的span标签都具有font-style属性。我一直试图找出我正在查看的font-style属性的特定span标签是否具有斜体值。有没有办法获取font-style属性的值,或者如果字体样式为斜体,则返回布尔值?
content = "<p dir="ltr">
<span style="color: rgb(0, 0, 0); font-style: normal; background-color: transparent; font-weight: 400; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap">a</span>
</p>,
<p dir="ltr">
<span style="color: rgb(0, 0, 0); font-style: italic; background-color: transparent; font-weight: 400; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap">b</span>
<span style="color: rgb(0, 0, 0); font-style: normal; background-color: transparent; font-weight: 400; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap">c</span>
</p>,
<p dir="ltr">
<span style="color: rgb(0, 0, 0); font-style: normal; background-color: transparent; font-weight: 400; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap">d</span>
<span style="color: rgb(0, 0, 0); font-style: italic; background-color: transparent; font-weight: 400; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap">e</span>
</p>"
soup = BeautifulSoup(test, 'html.parser')
page = {}
ital = []
i = 1
p = 1
for par in soup:
page[i] = {}
for x in par.find_all('span'):
if x['font-style'] == 'italic': #stuck here trying to figure out if font-style value is italic or not
ital.append(p)
par = 'par_{}'.format(p)
page[i].update({par:x.next})
p += 1
page[i].update({'ital':ital})
ital = []
i += 1
p = 1
更新:
我的目标是在span标签之间按page
的顺序获取所有内容,并知道内容的哪一部分是斜体的。
运行此页面后应如下图所示
print(page)
{
1: {'ital': [],
'par_1':'a'},
2: {'ital': [1],
'par_1':'b',
'par_2':'c'},
3: {'ital': [2],
'par_1':'d',
'par_2':'e'}
}
当前此代码正在打印
print(page)
{
1: {'ital': [],
'par_1':'a'},
2: {'ital': [],
'par_1':'b',
'par_2':'c'},
3: {'ital': [],
'par_1':'d',
'par_2':'e'}
}
答案 0 :(得分:0)
是的,您可以在BeautifulSoup的find_all()
函数中使用lambda。本示例将查找样式属性中所有带有' italize '的span
标签:
from bs4 import BeautifulSoup
content = '''"<p dir="ltr">
<span style="color: rgb(0, 0, 0); font-style: normal; background-color: transparent; font-weight: 400; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap">a</span>
</p>,
<p dir="ltr">
<span style="color: rgb(0, 0, 0); font-style: italize; background-color: transparent; font-weight: 400; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap">d</span>
<span style="color: rgb(0, 0, 0); font-style: normal; background-color: transparent; font-weight: 400; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap">b</span>
</p>,
<p dir="ltr">
<span style="color: rgb(0, 0, 0); font-style: normal; background-color: transparent; font-weight: 400; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap">c</span>
<span style="color: rgb(0, 0, 0); font-style: italize; background-color: transparent; font-weight: 400; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap">d</span>
</p>"'''
soup = BeautifulSoup(content, 'lxml')
for span in soup.find_all('span', style=lambda s: 'italize' in s):
print(span)
打印:
<span style="color: rgb(0, 0, 0); font-style: italize; background-color: transparent; font-weight: 400; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap">d</span>
<span style="color: rgb(0, 0, 0); font-style: italize; background-color: transparent; font-weight: 400; font-variant: normal; text-decoration: none; vertical-align: baseline; white-space: pre-wrap">d</span>
答案 1 :(得分:0)
span标记中的所有内容都是一个属性,该属性的名称是style。其他所有内容都是样式中的一个字符串。无法获得字体样式属性的值,因为它不是属性。if 'font-style: italic' in x['style']:
是检查字体样式是否为斜体的方法。 x['style']
以字符串形式返回样式属性的值。然后,它仅检查'font-style: italic'
返回的字符串中是否存在x['style']
。