Pandoc是否能够向任何元素注入任意HTML属性?

时间:2013-11-25 18:37:18

标签: html markdown pandoc

因此代码块可以使用fenced_code_blocks扩展名定义HTML属性:

~~~~ {#mycode .haskell .numberLines startFrom="100"}
qsort []     = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++
               qsort (filter (>= x) xs)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

是否可以以某种方式对常规文本块使用上述语法?例如,我想转换以下Markdown文本:

# My header

~~~ {.text}
This is regular text. This is regular text.
~~~

~~~ {.quote}
> This is the first level of quoting.
>
> > This is nested blockquote.
>
> Back to the first level.
~~~

~~~ {data-id=test-123}
+   Red
+   Green
+   Blue
~~~

这样的事情:

<h1 id="my-header">My header</h1>
<p class="text">This is regular text. This is regular text.</p>
<blockquote class="quote">
<p>This is the first level of quoting.</p>
<blockquote>
<p>This is nested blockquote.</p>
</blockquote>
<p>Back to the first level.</p>
</blockquote>
<ul data-id="test-123">
<li>Red</li>
<li>Green</li>
<li>Blue</li>
</ul>

如果Pandoc本身没有这样的支持,是否可以在Lua中创建一个自定义编写器呢?

编辑:查看sample.lua自定义编写器,有人知道第35行的“属性表”是什么吗?如何将这些属性传递给特定的Pandoc元素?此外,我在上面寻找的功能与header_extension扩展非常相似,只不过它适用于所有元素,而不仅仅是标题。

2 个答案:

答案 0 :(得分:3)

这在kramdown非常可行,它将转换以下输入

# My header

This is regular text. This is regular text.
{: .text}

> This is the first level of quoting.
>
> > This is nested blockquote.
>
> Back to the first level.
{: .quote}

+   Red
+   Green
+   Blue
{: data-id="test-123"}

<h1 id="my-header">My header</h1>

<p class="text">This is regular text. This is regular text.</p>

<blockquote class="quote">
  <p>This is the first level of quoting.</p>

  <blockquote>
    <p>This is nested blockquote.</p>
  </blockquote>

  <p>Back to the first level.</p>
</blockquote>

<ul data-id="test-123">
  <li>Red</li>
  <li>Green</li>
  <li>Blue</li>
</ul>

有关详细信息,请参阅attribute list definition section of the syntax

答案 1 :(得分:1)

Pandoc's filters让您操作Pandoc的文档内部表示。可以使用一系列过滤器进行不同的转换。我将分享两个应该有帮助的过滤器示例。

Markdown代码块

Pandoc中的代码块通常用于嵌入来自编程语言的源代码清单,但在这里我们尝试提取正文并将其解释为markdown。我们不是使用输入文档中的类,如textquote,而是使用通用的as-markdown类。 Pandoc会自动生成相应的标签。

# My header

~~~ {.as-markdown}
This is regular text. This is regular text.
~~~

~~~ {.as-markdown}
> This is the first level of quoting.
>
> > This is nested blockquote.
>
> Back to the first level.
~~~

~~~ {.as-markdown data-id=test-123}
+   Red
+   Green
+   Blue
~~~

~~~ haskell
main :: IO ()
~~~

为了确保没有as-markdown类的代码块被照常解释,我包含了一个haskell代码块。这是过滤器实现:

#!/usr/bin/env runhaskell
import Text.Pandoc.Definition       (Pandoc(..), Block(..), Format(..))
import Text.Pandoc.Error            (handleError)
import Text.Pandoc.JSON             (toJSONFilter)
import Text.Pandoc.Options          (def)
import Text.Pandoc.Readers.Markdown (readMarkdown)

asMarkdown :: String -> [Block]
asMarkdown contents =
  case handleError $ readMarkdown def contents of
    Pandoc _ blocks -> blocks

-- | Unwrap each CodeBlock with the "as-markdown" class, interpreting
-- its contents as Markdown.
markdownCodeBlock :: Maybe Format -> Block -> IO [Block]
markdownCodeBlock _ cb@(CodeBlock (_id, classes, _namevals) contents) =
  if "as-markdown" `elem` classes then
    return $ asMarkdown contents
  else
    return [cb]
markdownCodeBlock _ x = return [x]

main :: IO ()
main = toJSONFilter markdownCodeBlock

运行pandoc --filter markdown-code-block.hs index.md会产生:

<h1 id="my-header">My header</h1>
<p>This is regular text. This is regular text.</p>
<blockquote>
<p>This is the first level of quoting.</p>
<blockquote>
<p>This is nested blockquote.</p>
</blockquote>
<p>Back to the first level.</p>
</blockquote>
<ul>
<li>Red</li>
<li>Green</li>
<li>Blue</li>
</ul>
<div class="sourceCode"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span class="ot">main ::</span> <span class="dt">IO</span> ()</code></pre></div>

几乎就在那里!唯一不恰当的部分是HTML属性。

代码块元数据中的自定义HTML属性

以下过滤器可帮助您入门。当目标格式为web-script<script>时,它会将带有html类的代码块转换为HTML html5代码。

#!/usr/bin/env runhaskell
import Text.Pandoc.Builder
import Text.Pandoc.JSON

webFormats :: [String]
webFormats =
  [ "html"
  , "html5"
  ]

script :: String -> Block
script src = Para $ toList $ rawInline "html" ("<script type='application/javascript'>" <> src <> "</script>")

injectScript :: Maybe Format -> Block -> IO Block
injectScript (Just (Format format)) cb@(CodeBlock (_id, classes, _namevals) contents) =
  if "web-script" `elem` classes then
    if format `elem` webFormats then
      return $ script contents
    else
      return Null
  else
    return cb
injectScript _ x = return x

main :: IO ()
main = toJSONFilter injectScript

最后一个区块中的data-id=test-123会在_namevals的键值对中出现,类型为[(String, String)]。您需要做的只是重构script以支持HTML属性的任意标记和键值对,并根据这些输入指定要生成的HTML。要查看输入文档的本机表示,请运行pandoc -t native index.md

[Header 1 ("my-header",[],[]) [Str "My",Space,Str "header"]
,CodeBlock ("",["as-markdown"],[]) "This is regular text. This is regular text."
,CodeBlock ("",["as-markdown"],[]) "> This is the first level of quoting.\n>\n> > This is nested blockquote.\n>\n> Back to the first level."
,CodeBlock ("",["as-markdown"],[("data-id","test-123")]) "+   Red\n+   Green\n+   Blue"
,Para [Str "To",Space,Str "ensure",Space,Str "regular",Space,Str "code",Space,Str "blocks",Space,Str "work",Space,Str "as",Space,Str "usual."]
,CodeBlock ("",["haskell"],[]) "main :: IO ()"]

如果您想要使用这些示例中的任何一个,那么它们都在我的pandoc-experiments存储库中。