Question

我试图从Javascript中提交的文本中提取属性并将其更改为数组。

所以用户提交了这个：

<iframe src="http://www.stackoverflow.com/" width="123" height="123" frameborder="1"></iframe>

我会得到：

arr['src'] = http://www.stackoverflow.com/
arr['width'] = 123
arr['height'] = 123
arr['frameborder'] = 1

我认为只需要一个正则表达式，但任何帮助都会很棒！

Answer 1

我建议使用RegExp来解析用户输入的HTML，而不是创建DOM对象，因为在执行“简单”任务（例如获取属性值）时不希望加载外部内容（iframe, script, link, style, object, ...） HTML字符串。

使用与我的previous answer类似的（尽管类似的^矛盾？）方法，我创建了一个匹配引用属性值的函数。两者都引用，因为非引用的属性是匹配的。

代码当前返回一个带有第一个标记属性的对象，但它很容易扩展以检索所有HTML元素（参见答案的底部）。

小提琴：http://jsfiddle.net/BP4nF/1/

// Example:
var htmlString = '<iframe src="http://www.stackoverflow.com/" width="123" height="123" frameborder="1" non-quoted=test></iframe>';
var arr = parseHTMLTag(htmlString);
//arr is the desired object. An easy method to verify:
alert(JSON.stringify(arr));

function parseHTMLTag(htmlString){
    var tagPattern = /<[a-z]\S*(?:[^<>"']*(?:"[^"]*"|'[^']*'))*?[^<>]*(?:>|(?=<))/i;
    var attPattern = /([-a-z0-9:._]+)\s*=(?:\s*(["'])((?:[^"']+|(?!\2).)*)\2|([^><\s]+))/ig;
    // 1 = attribute, 2 = quote, 3 = value, 4=non-quoted value (either 3 or 4)

    var tag = htmlString.match(tagPattern);
    var attributes = {};
    if(tag){ //If there's a tag match
        tag = tag[0]; //Match the whole tag
        var match;
        while((match = attPattern.exec(tag)) !== null){
            //match[1] = attribute, match[3] = value, match[4] = non-quoted value
            attributes[match[1]] = match[3] || match[4];
        }
    }
    return attributes;
}

示例的输出相当于：

var arr = {
    "src": "http://www.stackoverflow.com/",
    "width": "123",
    "height": "123",
    "frameborder": "1",
    "non-quoted": "test"
};

额外：修改功能以获得多个匹配（仅显示要更新的代码）

function parseHTMLTags(htmlString){
    var tagPattern = /<([a-z]\S*)(?:[^<>"']*(?:"[^"]*"|'[^']*'))*?[^<>]*(?:>|(?=<))/ig;
    // 1 = tag name
    var attPattern = /([-a-z0-9:._]+)\s*=(?:\s*(["'])((?:[^"']+|(?!\2).)*)\2|([^><\s]+))/ig;
    // 1 = attribute, 2 = quote, 3 = value, 4=non-quoted value (either 3 or 4)

    var htmlObject = [];
    var tag, match, attributes;
    while(tag = tagPattern.exec(htmlString)){
        attributes = {};
        while(match = attPattern.exec(tag)){
            attributes[match[1]] = match[3] || match[4];
        }
        htmlObject.push({
            tagName: tag[1],
            attributes: attributes
        });
    }
    return htmlObject; //Array of all HTML elements
}

Answer 2

假设你正在做这个客户端，你最好不要使用RegExp，而是使用DOM：

var tmp = document.createElement("div");
tmp.innerHTML = userStr;

tmp = tmp.firstChild;
console.log(tmp.src);
console.log(tmp.width);
console.log(tmp.height);
console.log(tmp.frameBorder);

请确保不先将已创建的元素添加到文档中而不进行清理。您可能还需要循环创建的节点，直到到达element node。

Answer 3

假设他们总是会输入一个HTML元素，你可以解析它并从DOM中读取元素，就像这样（未经测试）：

var getAttributes = function(str) {
  var a={}, div=document.createElement("div");
  div.innerHTML = str;
  var attrs=div.firstChild.attributes, len=attrs.length, i;
  for (i=0; i<len; i++) {
    a[attrs[i].nodeName] = attrs[i].nodeValue];
  }
  return a;
};

var x = getAttributes(inputStr);
x; // => {width:'123', height:123, src:'http://...', ...}

Answer 4

使用纯JavaScript代替regexp：

抓住iframe元素：

var iframe = document.getElementsByTagName('iframe')[0];

然后使用以下方法访问其属性：

var arr = {
   src         : iframe.src,
   width       : iframe.width,
   height      : iframe.height,
   frameborder : iframe.frameborder
};

Answer 5

如果可能，我会亲自与jQuery进行此操作。有了它，您可以创建一个DOM元素，而无需将其实际注入页面并产生潜在的安全隐患。

var userTxt = '<iframe src="http://www.stackoverflow.com/" width="123" height="123" frameborder="1"></iframe>';
var userInput = $(userTxt);
console.log(userInput.attr('src'));
console.log(userInput.attr('width'));
console.log(userInput.attr('height'));
console.log(userInput.attr('frameborder'));

从用户提交的文本中获取属性！正则表达式？

5 个答案: