我正在尝试使用Pandoc将Evernote Markup Language(ENML)转换为Markdown。 ENML主要是XHTML的一个子集,带有一些额外的元素。我想要转换的元素是一个特殊的<en-todo checked="true"/>
。以下是包含两个en-todo
项的ENML文档示例:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE en-note SYSTEM "xml/enml2.dtd">
<en-note style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">
<div><en-todo checked="true"/>This is a thing<br/></div>
<div><en-todo checked="false"/>This is another thing<br/></div>
</en-note>
我正在尝试将其转换为以下降价:
[X] This is a thing
[ ] This is another thing
我目前的方法是创建一个JSON过滤器
pandoc --parse-raw -f html -t json test.enml | \
./my-filter | pandoc -f json -t markdown
我不确定如何正确解析RawInline
块:
[
{
"Para": [
{
"RawInline": [
"html",
"<en-todo checked=\"true\">"
]
},
{
"RawInline": [
"html",
"</en-todo>"
]
},
{
"Str": "This"
},
"Space",
{
"Str": "is"
},
"Space",
{
"Str": "a"
},
"Space",
{
"Str": "thing"
},
"LineBreak"
]
},
{
"RawBlock": [
"html",
"</div>"
]
},
{
"RawBlock": [
"html",
"<div>"
]
},
{
"Para": [
{
"RawInline": [
"html",
"<en-todo checked=\"false\">"
]
},
{
"RawInline": [
"html",
"</en-todo>"
]
},
{
"Str": "This"
},
"Space",
{
"Str": "is"
},
"Space",
{
"Str": "another"
},
"Space",
{
"Str": "thing"
},
"LineBreak"
]
}
]