I am trying to do some simple javascript syntax highlighting in a <pre>
element using regular expressions. I am well aware of the fact that javascript is not a regular language and as such can't be fully processed by regular expressions but that is not the point here.
Here is a small piece of that code, just to demonstrate the logic. It is actually working, what I am struggling with is a design of such code.
function highlightCode(node, pattern) {
var match;
var pos = 0;
//clear current content of pre and save it in text variable
var text = node.textContent;
node.textContent = '';
//while there is still something to match
while (match = pattern.exec(text)) {
//this is unmatched text, create new text node for it
var before = text.slice(pos, match.index);
node.appendChild(document.createTextNode(before));
//if the match matches the first group append it as a strong tag
if (group1.test(match[1])) {
var strong = document.createElement('strong');
strong.appendChild(document.createTextNode(match[1]));
node.appendChild(strong);
pos = pattern.lastIndex;
//if the match matches the second group append it as i tag
} else if (group2.test(match[1])) {
var italic = document.createElement('i');
italic.appendChild(document.createTextNode(match[1].slice(
0, match[1].length - 1)));
node.appendChild(italic);
pos = pattern.lastIndex - 1;
}
}
//append the remaining text as a regular text
node.appendChild(document.createTextNode(text.slice(pos)));
}
And here is the rest of the code.
//match function, var, return or any word that is followed by (
var pattern = /(\bfunction\b|\bvar\b|\breturn\b|\b[_a-zA-Z][\w_]*\()/g;
//the same as before just separated into groups
var group1 = /\b(function|var|return)\b/;
var group2 = /\b[_a-zA-Z][\w_]*\(/;
var node = document.querySelector('pre');
highlightCode(node, pattern);
The particular thing that I don't like about this is that at first, I have to match the text against all the possible patterns, and then check each match again individually against separate groups to distinguish between them.
Using this logic, I can't run highlightCode
for each group individually because that would erase the previous changes since the content of pre
element is recreated at the start and throughout execution of highlightCode
.
Moreover, I can't use the general pattern
group alone, because I wouldn't know how to distinguish between separate cases. (highlight this code this way and that code that way).
Is there more "correct" approach for this task, using just regular expressions or is this basically it?