解析类的特定部分

时间:2020-06-04 12:36:04

标签: python parsing web-scraping beautifulsoup html-parsing

我想从html doc类进行解析,但前提是该类中包含特定单词。例如

 //initial array 
[
  {
    id: 'status',
    label: 'Status',
  },
  {
    id: 'group',
    label: 'Group',
  },
  {
    id: 'name',
    label: 'Name',
  },
  {
    id: 'hypervisor',
    label: 'Hypervisor',
  },
]
 //expected output after removing object with id 'group':
  {
    id: 'status',
    label: 'Status',
  },
  {
    id: 'name',
    label: 'Name',
  },
  {
    id: 'hypervisor',
    label: 'Hypervisor',
  },
]
//expected output after adding object with id 'group' (add by key string, keep the same object place as it was in the initial array):
[
  {
    id: 'status',
    label: 'Status',
  },
  {
    id: 'group',
    label: 'Group',
  },
  {
    id: 'name',
    label: 'Name',
  },
  {
    id: 'hypervisor',
    label: 'Hypervisor',
  },
]

此Python代码

<div class="article-xyz"> or <div class="abcd-xyzefg"> 

如果我搜索“ xyz”,应该提取一些结果,但不是。

这是我的测试html:

from bs4 import BeautifulSoup

with open('simple2.html') as html_file:
    soup = BeautifulSoup(html_file, 'lxml')

article_all = soup.find_all('div', class_='xyz')

到目前为止,我在BS4中使用Python 3.7。

有人可以帮助我吗?

谢谢你的问候

1 个答案:

答案 0 :(得分:1)

使用lambda之类的

article_all = soup.find_all('div', class_=lambda x: x and 'xyz' in x)