Question

我有一个HTML目录页面，其中包含带有超链接的书籍章节列表：

<a href="final/main.html">Multimedia Implementation</a><br/>
<a href="final/toc.html">Table of Contents</a><br/>
<a href="final/pref01.html">About the Author</a><br/>
<a href="final/pref02.html">About the Technical Reviewers</a><br/>
<a href="final/pref03.html">Acknowledgments</a><br/>
<a href="final/part01.html">Part I: Introduction and Overview</a><br/>
<a href="final/ch01.html">Chapter 1. Technical Overview</a><br/>
...

我想为Kindle书创建NCX文件，其中必须包含以下详细信息：

<navPoint id="n1" playOrder="1">
<navLabel>
<text>Multimedia Implementation</text>
</navLabel>
<content src="final/main.html"/>
</navPoint>
<navPoint id="n2" playOrder="2">
<navLabel>
<text>Table of Contents</text>
</navLabel>
<content src="final/toc.html"/>
</navPoint>
<navPoint id="n3" playOrder="3">
<navLabel>
<text>About the Author</text>
</navLabel>
<content src="final/pref01.html"/>
</navPoint>
...

我正在使用Notepad ++：是否可以使用正则表达式自动执行此过程？

Answer 1

使用正则表达式无法完成所有操作..您可以将问题分成两部分。

使用程序逻辑（增量变量）

<navPoint id="n1" playOrder="1">

剩下的你可以用正则表达式

使用以下正则表达式匹配：

<a\shref="([^"]*)">([^<]*)<\/a><br\/>

并替换为：

(generated string)<navLabel>\n<text>\2</text>\n<content src="\1"/>\n</navPoint>

请参阅DEMO

Answer 2

是的，可能会用<navpoint>标签替换链接。我找不到解决方案的唯一方法是<navpoint>属性id和playOrder的增量编号......

以下正则表达式将完成大部分工作：

/^<a[^>]*href="([^"]+)"[^>]*([^<]+).*$/gm

替换为：

<navpoint id="n" playOrder="">\n<navLabel><text>$2</text></navLabel>\n<content src="$1" />\n</navpoint>\n

正则表达式详细信息

/^<a     .. only parse lines that start with an `<a` tag
.*href=" .. find the first occurance of `href="`
([^"]+)  .. capture the text and stop when a " is found
"[^>]*>  .. find the end of the <a> tag
([^<]+)  .. capture the text and stop when a < is found (i.e. the </a> tag)
.*$/     .. continue to end of the line
gm       .. search the whole string and parse each line individually

更详细（但也更令人困惑）的解释如下： https://regex101.com/r/gA0yJ2/1 此链接还演示了正则表达式的工作原理。如果您愿意，可以在那里测试更改

使用Notepad ++和Regular表达式创建NCX文件

2 个答案: