我想从html doc类进行解析,但前提是该类中包含特定单词。例如
//initial array
[
{
id: 'status',
label: 'Status',
},
{
id: 'group',
label: 'Group',
},
{
id: 'name',
label: 'Name',
},
{
id: 'hypervisor',
label: 'Hypervisor',
},
]
//expected output after removing object with id 'group':
{
id: 'status',
label: 'Status',
},
{
id: 'name',
label: 'Name',
},
{
id: 'hypervisor',
label: 'Hypervisor',
},
]
//expected output after adding object with id 'group' (add by key string, keep the same object place as it was in the initial array):
[
{
id: 'status',
label: 'Status',
},
{
id: 'group',
label: 'Group',
},
{
id: 'name',
label: 'Name',
},
{
id: 'hypervisor',
label: 'Hypervisor',
},
]
此Python代码
<div class="article-xyz"> or <div class="abcd-xyzefg">
如果我搜索“ xyz”,应该提取一些结果,但不是。
这是我的测试html:
from bs4 import BeautifulSoup
with open('simple2.html') as html_file:
soup = BeautifulSoup(html_file, 'lxml')
article_all = soup.find_all('div', class_='xyz')
到目前为止,我在BS4中使用Python 3.7。
有人可以帮助我吗?
谢谢你的问候
答案 0 :(得分:1)
使用lambda之类的
article_all = soup.find_all('div', class_=lambda x: x and 'xyz' in x)