Python& BeautifulSoup - 我可以反复使用findAll函数吗?

时间:2015-08-06 21:27:59

标签: python beautifulsoup findall

在BS文档中,他们写道:

Remember the soup.head.title trick from Navigating using tag names? That trick works by repeatedly calling find():
soup.head.title
# <title>The Dormouse's story</title>

soup.find("head").find("title")
# <title>The Dormouse's story</title>

我可以用findAll做同样的事情吗?无法使其有效......

2 个答案:

答案 0 :(得分:2)

不,你不能链接findAll,因为它返回一个bs4.element.ResultSet,它基本上是没有findAll方法的列表。如果你尝试了它,你就会遇到明显的错误。

bs4.element.ResultSet的属性远远少于bs4.element.Tag,其中大多数只是常规列表方法:

fn =  soup.findAll("title")

fn.append   fn.copy     fn.extend   fn.insert   fn.remove   fn.sort
fn.clear    fn.count    fn.index    fn.pop      fn.reverse  fn.source

.find / bs4.element.Tag,attributes:

In [25]: f =  soup.find("title")

In [26]: f.
Display all 100 possibilities? (y or n)
f.HTML_FORMATTERS           f.has_attr
f.XML_FORMATTERS            f.has_key
f.append                    f.hidden
f.attribselect_re           f.index
f.attrs                     f.insert
f.can_be_empty_element      f.insert_after
f.childGenerator            f.insert_before
f.children                  f.isSelfClosing
f.clear                     f.is_empty_element
f.contents                  f.name
f.decode                    f.namespace
f.decode_contents           f.next
f.decompose                 f.nextGenerator
f.descendants               f.nextSibling
f.encode                    f.nextSiblingGenerator
f.encode_contents           f.next_element
f.extract                   f.next_elements
f.fetchNextSiblings         f.next_sibling
f.fetchParents              f.next_siblings
f.fetchPrevious             f.parent
f.fetchPreviousSiblings     f.parentGenerator
f.find                      f.parents
f.findAll                   f.parserClass
f.findAllNext               f.parser_class
f.findAllPrevious           f.prefix
f.findChild                 f.prettify
f.findChildren              f.previous
f.findNext                  f.previousGenerator
f.findNextSibling           f.previousSibling
f.findNextSiblings          f.previousSiblingGenerator
f.findParent                f.previous_element
f.findParents               f.previous_elements
f.findPrevious              f.previous_sibling
f.findPreviousSibling       f.previous_siblings
f.findPreviousSiblings      f.recursiveChildGenerator
f.find_all                  f.renderContents
f.find_all_next             f.replaceWith
f.find_all_previous         f.replaceWithChildren
f.find_next                 f.replace_with
f.find_next_sibling         f.replace_with_children
f.find_next_siblings        f.select
f.find_parent               f.select_one
f.find_parents              f.setup
f.find_previous             f.string
f.find_previous_sibling     f.strings
f.find_previous_siblings    f.stripped_strings
f.format_string             f.tag_name_re
f.get                       f.text
f.getText                   f.unwrap
f.get_text                  f.wrap

答案 1 :(得分:0)

如果所有搜索都给出一个结果,或者您知道确切的索引,则可以像这样链接它:

listItems = soup.findAll("div", { "class": "table-wrap" })[0] \
.findAll("table")[0] \
.findAll("tr")

将适用于

<div class="table-wrap">
  <table>
    <tr>
...