Question

我需要将降价文本转换为纯文本格式，以便在我的网站中显示摘要。我想要python中的代码。

Answer 1

此模块将帮助您完成所描述的内容：

http://www.freewisdom.org/projects/python-markdown/Using_as_a_Module

将markdown转换为HTML后，您可以使用HTML解析器去除纯文本。

您的代码可能如下所示：

from BeautifulSoup import BeautifulSoup
from markdown import markdown

html = markdown(some_html_string)
text = ''.join(BeautifulSoup(html).findAll(text=True))

Answer 2

评论并删除它，因为我终于认为我看到了这里的问题：将降价文本转换为HTML并从文本中删除HTML可能更容易。我没有意识到有效地从文本中删除markdown，但有许多HTML到纯文本解决方案。

Answer 3

尽管这是一个非常老的问题，但我想提出一个我最近想出的解决方案。这既不使用BeautifulSoup，也没有转换为html和返回的开销。

markdown 模块核心类Markdown具有属性 output_formats ，该属性不可配置，但可以像python中的任何其他东西一样进行修补。此属性是将输出格式名称映射到渲染函数的字典。默认情况下，它有两种输出格式，分别是'html'和'xhtml'。在一点帮助下，它可能具有易于编写的纯文本呈现功能：

from markdown import Markdown
from io import StringIO


def unmark_element(element, stream=None):
    if stream is None:
        stream = StringIO()
    if element.text:
        stream.write(element.text)
    for sub in element:
        unmark_element(sub, stream)
    if element.tail:
        stream.write(element.tail)
    return stream.getvalue()


# patching Markdown
Markdown.output_formats["plain"] = unmark_element
__md = Markdown(output_format="plain")
__md.stripTopLevelTags = False


def unmark(text):
    return __md.convert(text)

取消标记功能将降价文字作为输入，并返回所有去除的降价字符。

Answer 4

这与Jason的答案类似，但是可以正确处理评论。

import markdown # pip install markdown
from bs4 import BeautifulSoup # pip install beautifulsoup4

def md_to_text(md):
    html = markdown.markdown(md)
    soup = BeautifulSoup(html, features='html.parser')
    return soup.get_text()

def example():
    md = '**A** [B](http://example.com) <!-- C -->'
    text = md_to_text(md)
    print(text)
    # Output: A B

Python：如何将markdown格式的文本转换为文本

4 个答案: