Question

我正在尝试修改一些代码，以使人们可以使用Pandoc（好吧，Pypandoc）在项目目录中编译代码，这些目录可能采用多种格式，例如HTML，LaTex，markdown。我在HTML文件中有一些代码，如下所示：

<h1 data-label="850151" class="ltx_title_section">A heading</h1><h2 data-label="367935" class="ltx_title_subsection">Another heading</h2><div><cite class="ltx_cite raw v1">\cite{ebert_epidemiology_2013}</cite></div><div>Figure <span class="au-ref raw v1">\ref{286335}</span></div><div></div>

这是LaTex的输出。

...
\section{A heading}\label{a-heading}

\subsection{Another heading}\label{another-heading}

\textbackslash{}cite\{ebert\_epidemiology\_2013\}

Figure {\textbackslash{}ref\{286335\}}
...

期望的输出当然是

...
\section{A heading}\label{a-heading}

\subsection{Another heading}\label{another-heading}

\cite{ebert_epidemiology_2013}

\ref{286335}
...

我认为，如果我可以让Pandoc剥离<cite>标签并以纯文本形式编写引文命令，那将是可行的。我知道Pandoc过滤器是一回事，但是我不确定这是否是我需要的。

Answer 1

是的，您可以使用filter剥离cite标签。

如果您使用pandoc -f html+raw_html -t native，则会看到<cite>bar</cite>被渲染为：

RawInline (Format "html") "<cite>",Str "bar",RawInline (Format "html") "</cite>"

因此过滤器应类似于：

function RawInline(elem)
  if elem.format == "html"
    return {}
  else
    return elem
end

让Pandoc通过html`<cite>`标签以纯文本格式编写LaTex命令

1 个答案: