Question

我想从html doc类进行解析，但前提是该类中包含特定单词。例如

 //initial array 
[
  {
    id: 'status',
    label: 'Status',
  },
  {
    id: 'group',
    label: 'Group',
  },
  {
    id: 'name',
    label: 'Name',
  },
  {
    id: 'hypervisor',
    label: 'Hypervisor',
  },
]
 //expected output after removing object with id 'group':
  {
    id: 'status',
    label: 'Status',
  },
  {
    id: 'name',
    label: 'Name',
  },
  {
    id: 'hypervisor',
    label: 'Hypervisor',
  },
]
//expected output after adding object with id 'group' (add by key string, keep the same object place as it was in the initial array):
[
  {
    id: 'status',
    label: 'Status',
  },
  {
    id: 'group',
    label: 'Group',
  },
  {
    id: 'name',
    label: 'Name',
  },
  {
    id: 'hypervisor',
    label: 'Hypervisor',
  },
]

此Python代码

<div class="article-xyz"> or <div class="abcd-xyzefg">

如果我搜索“ xyz”，应该提取一些结果，但不是。

这是我的测试html：

from bs4 import BeautifulSoup

with open('simple2.html') as html_file:
    soup = BeautifulSoup(html_file, 'lxml')

article_all = soup.find_all('div', class_='xyz')

到目前为止，我在BS4中使用Python 3.7。

有人可以帮助我吗？

谢谢你的问候

Answer 1

使用lambda之类的

article_all = soup.find_all('div', class_=lambda x: x and 'xyz' in x)

解析类的特定部分

1 个答案: