Javascript - 替换少于X个字符的所有行

时间:2011-12-07 14:32:19

标签: javascript regex bookmarklet

我正在尝试创建一个Javascript书签:

  • 查看表单字段(类“mceContentBody”)的内容,
  • 找到标签内的内容较少的所有段落标签 超过50个字符,
  • 在里面添加“强”标签。

所以     <p>This is less than 50 chars</p> 会成为     <p><strong>This is less than 50 chars</strong></p>

但     <p>This is a very long line that is more than 50 characters so it will remain untouched.</p>

这是我现在所拥有的,但是当我运行它时,它会使表单字段的全部内容变为粗体。

我确信我用正则表达式搞砸了。我错过了什么?

javascript:var x = window.frames[1].document.getElementsByClassName("mceContentBody")[0].innerHTML;

x=x.replace(/(<p.*?>([A-Za-z ]{0,50})<\/p>)/g, "<p><strong>$1</strong></p>");

window.frames[1].document.getElementsByClassName("mceContentBody")[0].innerHTML=x;empty();

谢谢!

4 个答案:

答案 0 :(得分:2)

不要用正则表达式解析html,只需使用手边的抛光html解析器:

function replaceContents( contents ) {
var div = document.createElement("div"),
    paragraphs, i, l, paragraph, text,
    textProp = "textContent" in div ? "textContent" : "innerText";

div.innerHTML = contents;

paragraphs = div.getElementsByTagName("p");
l = paragraphs.length;

    for( i = 0; i < l; ++i ) {
    paragraph = paragraphs[i];
    text = paragraph[textProp];

        if( text.length > 0 && text.length < 50 ) {
        paragraph.innerHTML = "<strong>"+text+"</strong>";
        }
    }

return div.innerHTML;
}

此处使用示例:http://jsfiddle.net/wUfRQ/

答案 1 :(得分:1)

将您的正则表达式更改为此,因此您匹配除了P开头标记的末尾之外的所有内容,它应该类似于以下内容(或see this regex test):

x=x.replace(/(<p[^>]*?>([A-Za-z ]{0,50})<\/p>)/g, "<p><strong>$1</strong></p>"); 

这里的问题是你的匹配太多(参见this regex test)..这是一个很好的示例HTML,我猜测就像你遇到问题一样。

<form><p>This is my form it has a lot of words in this paragraph because it is too cool for school. This is my form it has a lot of words in this paragraph because it is too cool for school. This is my form it has a lot of words in this paragraph because it is too cool for school. This is my form it has a lot of words in this paragraph because it is too cool for school.</p><p>Short</p></form>

注意:这会有一些失误。如果由于某种原因有“&gt;” P开头标记中的字符。我假设情况并非如此,因为除非内联JavaScript,否则这种情况很少见。

答案 2 :(得分:0)

我会将您的代码更改为此(原因在评论栏中):

var x = window.frames[1].document.getElementsByClassName("mceContentBody")[0].innerHTML;
/*
    Changed: <p.*?>
    To: <p[^>]*>
    Because: "." will include ">". By making a negated character class, we are ensuring that the regex will find the closing ">".

    Changed: [A-Za-z ]{0,50}
    To: [^<]{1,50}
    Because: Paragraph elements can contain other characters than letters and spaces (including your example paragraph to be captured.
             Properly formated HTML should never have a "<" character in the innerHTML of a paragraph element.
             Made the minimum "1" because there's no point to putting an empty strong element inside an empty paragraph element.

        Removed outer capturing block as it was not being used.

*/
x = x.replace(/<p[^>]*>([^<]{1,50})<\/p>/g, "<p><strong>$1</strong></p>");
window.frames[1].document.getElementsByClassName("mceContentBody")[0].innerHTML = x;
empty();

JSLint对此的唯一问题是,否定字符类的使用被认为是“不安全”,因为可能会捕获Unicode字符。但是,由于您没有将此用作输入字段,因此这应该不是问题。

希望这有帮助。

答案 3 :(得分:0)

你的外括号捕获整个匹配,所以$1不是你想要的。请改用$2

或删除外括号。