Question

我的网站上有一堆跟踪列表内容采用以下格式：

<div class="tracklist">
1. Artist - Title (Record Label)
2. Another artist - Title (Another label)
</div>

我想使用正则表达式来查找找到艺术家和标签名称并将其包装在如下链接中：

<div class="tracklist">
1. <a href="http://www.example.com/Artist">Artist</a> - Title <a href="http://www.example.com/Record+Label">(Record Label)</a>
2. <a href="http://www.example.com/Another+Artist">Another artist</a> - Title <a href="http://www.example.com/Another+label">(Another label)</a>  
</div>

我想我可以使用JavaScript正则表达式找到艺术家和标签名称：

var artist = /[0-9]\. .*? -/gi
var label = /\(.*?\)/gi

使用jQuery查找匹配的字符串：

$(".tracklist").html().match(label)
$(".tracklist").html().match(artist)

然后使用substring()方法删除数字，句点，空格，短划线和括号。但是什么是插入链接并保留文本的好方法呢？

在更一般的层面上，这个想法是否可行还是属于“不用JavaScript解析HTML”？服务器端实现是否更可取（有一些XML / XSL魔术）？

Answer 1

服务器端实现肯定会更好。你在哪里提取下面的数据？当然你有一个数组或类似的信息？

1. Artist - Title (Record Label)
2. Another artist - Title (Another label)

如果用户没有javascript（现在几乎可以忽略不计，但确实发生了！），服务器端也会很好地贬值。

Answer 2

它不属于“不用html解析html”，因为你没有解析HTML，你正在解析文本并从中创建HTML。

您可以获得div的全文内容：

var text = $('.tracklist').text();

然后分成几行：

var lines = text.split(/\r?\n/);

分别解析每一行：

function parseLine(line) {
    var match = line.match(/^\d+\.\s+([^-]+)\s-\s([^(]+)(\s*(.*))/);
    if (match) {

        var artist = match[1], title = match[2], label = match[4];

        // create HTML here
    }       
}

$.each(lines, function(index, line) {
    var elems = parseLine(line);
    // append elems to the div
}

正则表达式可以解释如下：

/^\d+\. # this matches the number followed by the dot at the begining
\s+     # the number is separated by one or more whitespace
([^-]+) # the artist: match everything except "-"
\s-\s   # matches the "-" separated by one or more whitespace
([^(]+) # the title: matches everything except "("
(\s+    # one or more whitespace
(.*))/  # the label

Answer 3

我没有看到切换到XSLT的任何意义，因为你仍然需要将DIV的内容作为文本处理。对于那种事情，jQuery / regex和它一样好。你只是没有尽可能有效地使用正则表达式。就像@arnaud说的那样，你应该一次匹配并处理一整行，使用捕获组来分解有趣的部分。这是我要使用的正则表达式：

/^(\d+)\.\s*([^-]+?)\s*-\s*([^(]+?)\s*\((.*)\)/

match[1]是曲目编号，
match[2]是艺术家，
match[3]是标题，和 match[4]是标签

我还安排它以便不会捕获周围的空格或其他字符 - 实际上，大多数空格是可选的。根据我的经验，像这样的格式化数据通常包含间距不一致;这使得正则表达式更有可能与您想要的匹配，并且它使您能够纠正不一致性。（当然，它也可能包含更严重的缺陷，但通常必须根据具体情况处理。）

Javascript正则表达式将链接插入列表

3 个答案: