Question

我试图解析一个字符串，其中包含一些降价样式分隔符。我需要一个带有样式的列表。我已经尝试过pyparsing并取得了一些成功，但我觉得可能有更好的方法（基本上是在http://pyparsing.wikispaces.com/使用mbeaches的帖子。）

基本上，如果我有一个字符串

word_paragraph = "This is **bold** and this is *italic* sample"

我想提供以下内容后返回元组列表：

style_delim = {'Bold': '**', 'Italics':'*', } 
word_pg_parsed = somefunction(word_paragraph,style_delim)

会导致word_pg_parsed像：

word_pg_parsed = [('Normal','This is '),('Bold','bold'),('Normal','and this is '),('Italics','italic'),('Normal',' sample')]

我已经查看过markdown，但无法找到此功能的存在位置。我怀疑有一个图书馆（挖到PLY但是找不到我想要的东西）来处理这个问题。

为什么呢？我试图使用python-docx文件创建一个word文件，包括一些标记文本中的一些文本，并且需要相应地处理内联字符样式。有没有python-markdown或其他任何人见过的图书馆这样做？

Answer 1

如果有人想要这样做，这就是我找到的。非常感谢Waylan指出我的迷雾并为图书馆提供了启示。

default_output方法已替换为占位符。这是您需要覆盖以获取列表而不是字符串的那个。在此引用：https://github.com/lepture/mistune/pull/20

基本上遵循测试用例中的内容： https://github.com/lepture/mistune/blob/878f92bdb224a8b7830e8c33952bd2f368e5d711/tests/test_subclassing.py确实需要 getattribute ，否则您在列表中调用字符串函数时会出现错误。

在test_subclassing.py中查找TokenTreeRenderer。

在django views.py中重复我的工作样本：

from django.shortcuts import render
from .forms import ParseForm   # simple form with textarea field called markup
import mistune


class TokenTreeRenderer(mistune.Renderer):
    # options is required
    options = {}

    def placeholder(self):
        return []

    def __getattribute__(self, name):
        """Saves the arguments to each Markdown handling method."""
        found = TokenTreeRenderer.__dict__.get(name)
        if found is not None:
            return object.__getattribute__(self, name)

        def fake_method(*args, **kwargs):
            return [(name, args, kwargs)]
        return fake_method


def parse(request):
    context = {}
    if request.method == 'POST':
        parse_form = ParseForm(request.POST)
        if parse_form.is_valid():
            # parse the data
            markdown = mistune.Markdown(renderer=TokenTreeRenderer())
            tokenized = markdown(parse_form.cleaned_data['markup'])
            context.update({'tokenized': tokenized, })
            # no need for a redirect in this case

    else:
        parse_form = ParseForm(initial={'markup': 'This is a **bold** text sample', })

    context.update({'form': parse_form, })
    return render(request, 'mctests/parse.html', context)

这导致输出：

 [('paragraph', ([('text', (u'This is a ',), {}), ('double_emphasis', ([('text', (u'bold',), {})],), {}), ('text', (u' text sample',), {})],), {})]

对我很有用。

解析一个字符串，多个分隔符返回带有样式和文本的元组列表

1 个答案: