BeautifulSoup:将contents []作为单个字符串

时间:2010-12-20 10:39:33

标签: python beautifulsoup

任何人都知道将汤对象的全部内容作为单个字符串的优雅方式吗?

目前我正在获取contents,这当然是一个列表,然后迭代它:

notices = soup.find("div", {"class" : "middlecontent"})
con = ""
for content in notices.contents:
    con += str(content)
print con

谢谢!

4 个答案:

答案 0 :(得分:28)

contents = str(notices)怎么样?

或者contents = notices.renderContents(),它会隐藏div标签。

答案 1 :(得分:4)

您可以使用join()方法:

notices = soup.find("div", {"class": "middlecontent"})
contents = "".join([str(item) for item in notices.contents])

或者,使用生成器表达式:

contents = "".join(str(item) for item in notices.contents)

答案 2 :(得分:1)

#!/usr/bin/env python
# coding: utf-8
__author__ = 'spouk'

import BeautifulSoup
import requests


def parse_contents_href(url, url_args=None, check_content_find=None, tag='a'):
    """
    parse href contents url and find some text in href contents [ for example ]
    """
    html = requests.get(url, params=url_args)
    page = BeautifulSoup.BeautifulSoup(html.text)
    alllinks = page.findAll(tag,  href=True)
    result = check_content_find and filter(
        lambda x: check_content_find in x['href'], alllinks) or alllinks
    return result and "".join(map(str, result)) or False


url = 'https://vk.com/postnauka'
print parse_contents_href(url)

答案 3 :(得分:0)

但是这个列表是递归的,所以...... 我认为这会奏效 我是python的新手,所以代码可能看起来有点奇怪

getString = lambda x: \
    x if type(x).__name__ == 'NavigableString' \
    else "".join( \
    getString(t) for t in x)

contents = getString(notices)