Question

我想使用带有两种替代格式的javascript来解析字符串：

id#state#{font name, font size, "text"}  
// e.g. button1#hover#{arial.ttf, 20, "Ok"}

或

id#state#text                            
// e.g. button1#hover#Ok

在第二个版本中，假定使用默认字体和大小。

在你进一步阅读之前，我必须指出我控制格式，所以我很想知道任何其他格式更多的RegExp Friendly™。话虽如此，出于历史原因需要第二种替代方案，id#state# - 部分也是如此。换句话说，灵活性存在于{font name, font size, "text"} - 部分。

此外，我想尽可能使用RegExp。是的，我在下面建议的RegExp是非常毛茸茸的，但对于我的情况，这不仅是解决手头问题的可能方法，而且还是了解有关RegExp本身的更多信息。

我目前尝试将两种格式的三个或五个信息元素分组如下：

var pat = /^(\w*)#(\w*)#
          (?:(?:\{([\w\.]*),\s*([0-9\.]*),\s*"([\w\s]*)"\})|([\w\s]*))$/;

var source1 = "button1#hover#{arial.ttf, 20, \"Ok\"}";
var source2 = "button1#hover#Ok";

var result1 = source1.match ( pat );
var result2 = source2.match ( pat );

alert ( "Source1: " + result1.length + " Source2: " + result2.length );

当我在http://www.regular-expressions.info/javascriptexample.html测试这个表达式时，我得到了：

result1 = [ button1#hover#{arial.ttf, 20, "Ok"}, button1, hover, arial.ttf, 
            20, Ok, undefined ]

和

result2 = [ button1#hover#Ok, button1, hover, undefined, 
            undefined, undefined, Ok ]

以下是我如何分解RegExp：

^(\w*)#(\w*)#(?:(?:\{([\w\.]*),\s*([0-9\.]*),\s*"([\w\s]*)"\})|([\w\s]*))$

^                 # anchor to beginning of string
(\w*)             # capture required id
#                 # match hash sign separator
(\w*)             # capture required state
#                 # match hash sign separator
                  # capture text structure with optional part:
(?:(?:\{([\w\.]*),\s*([0-9\.]*),\s*"([\w\s]*)"\})|([\w\s]*))  
$                 # anchor to end of string

我想，文本结构捕获是最蠢的部分。我把它分解如下：

(?:                  # match all of what follows but don't capture
    (?:\{            # match left curly bracket but don't capture (non-capturing group)
          ([\w\.]*)  # capture font name (with possible punctuation in font file name)
          ,\s*       # match comma and zero or more whitespaces
          ([0-9\.]*) # capture font size (with possible decimal part)
          ,\s*"      # match comma, zero or more whitespaces, and a quotation char
          ([\w\s]*)  # capture text including whitespaces
    "\})             # match quotation char and right curly bracket (and close non-capturing group)
    |                # alternation operator
    ([\w\s]*)        # capture optional group to match the second format variant
)                    # close outer non-capturing group

我的问题有两个：

1）如何避免result1情况下的尾随未定义匹配？

2）如何避免result2案例中间的三个未定义匹配？

奖金问题：

我得到了正确的分解吗？（我猜有些不对劲，因为RegExp并没有像预期的那样完全正常工作。）

谢谢！：）

Answer 1

正则表达式中的组从左到右编号，而不考虑运算符（特别是|运算符）。当您获得(x)|(y)后，“x”或“y”的群组将为undefined。

因此，您无法避免结果中的空槽。事实上，我认为你想要他们，因为否则你真的不知道你匹配的输入形式。

试图理解javascript regexp结果

1 个答案: