使用BeautifulSoup根据文本内容删除元素

时间:2016-06-21 17:59:31

标签: python html beautifulsoup html-parsing

我想删除包含单词“Amend”的表行(tr)元素。如何更改下面的代码才能实现?

import { Component } from '@angular/core';
import { ROUTER_DIRECTIVES } from '@angular/router';

@Component({
    selector: 'app',
    template: `
        <p>Angular 2 is running...</p>
        <!-- Routed views go here -->
        <router-outlet></router-outlet>
        `,
     directives: [ROUTER_DIRECTIVES] //here
})
export class AppComponent {
}

***编辑:

我试过以下无济于事:

for e in soup.findAll("tr"):
   e.extract()

***编辑:

这是我正在处理的页面:

https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=AAON&type=10&dateb=&owner=exclude&count=40

1 个答案:

答案 0 :(得分:1)

如何查找包含Amend的所有节点,将up the tree转到tr并删除:

for amend in soup.find_all(text=re.compile("Amend")):
    tr = amend.find_parent("tr")
    if tr:  # safety feature
        tr.extract()

或者,您也可以使用searching function

for tr in soup.find_all(lambda node: node and \
                                     node.name == "tr" and \
                                     node.find(text=re.compile("Amend"))):
    tr.extract()