Question

我正在尝试编写一个LPeg模式来匹配字符串：

以字母开头
此后包含字母数字字符
不包含两个或更多连续连字符（例如，禁止test--string）

作为参考，正则表达式[a-zA-Z](-?[a-zA-Z0-9])*与我正在寻找的匹配。

以下是我正在使用的代码，供参考：

require "lpeg"
P,R,C = lpeg.P,lpeg.R,lpeg.C

dash  = P"-"
ucase  = R"AZ"
lcase  = R"az"
digit  = R"09"
letter = ucase + lcase
alphanum = letter + digit

str_match = C(letter * ((dash^-1) * alphanum)^0)

strs = {
    "1too",
    "too0",
    "t-t-t",
    "t-t--t",
    "t--t-t",
    "t-1-t",
    "t--t",
    "t-one1",
    "1-1",
    "t-1",
    "t",
    "tt",
    "t1",
    "1",
}

for _,v in ipairs(strs) do
    if lpeg.match(str_match,v) ~= nil then
        print(v," => match!")
    else
        print(v," => no match")
    end
end

然而，令我沮丧的是，我得到以下输出：

1too     => no match
too0     => match!
t-t-t    => match!
t-t--t   => match!
t--t-t   => match!
t-1-t    => match!
t--t     => match!
t-one1   => match!
1-1      => no match
t-1      => match!
t        => match!
tt       => match!
t1       => match!
1        => no match

尽管代码输出的内容不同，t-t--t，t--t-t和t--t也不匹配。

Answer 1

在你的模式letter * ((dash^-1) * alphanum)^0中，lpeg将尝试匹配字符串的前缀。对于您不期望匹配的情况

<强>吨叔 - 吨
  的吨 - 叔吨
  的吨 - 吨

以粗体突出显示的部分是您的模式成功匹配的位置。如果没有捕获任何内容，lpeg.match将返回使用您的模式解析的最后一个位置（这是一个数字）。对于上述3种情况，捕获匹配的子部分，这解释了您正在看到的错误输出。

如果您只是一次匹配一个字符串，则可以修改模式以检查解析后是否还剩下剩余字符。

str_match = C(letter * ((dash^-1) * alphanum)^0) * -1

同样使用lpeg.re模块

re_pat = re.compile "{ %a ('-'? %w)* } !."

对于流匹配或查找目标字符串中的所有模式出现，将语法规则堆叠在一起，如此

stream_parse = re.compile [[ stream_match <- ((str_match / skip_nonmatch) delim)* str_match? str_match <- { %a ('-'? %w)* } (&delim / !.) skip_nonmatch <- !str_match (!delim .)* delim <- %s+ ]]

任何匹配都将被捕获并返回。如果没有匹配，您将返回nil或一个数字，指示模式停止解析的字符串中的位置。

编辑：对于需要解析才能在不匹配时返回nil的情况，这种对语法的调整应该可以解决问题

stream_parse = re.compile [[ stream_match <- (str_match / skip_nonmatch+ &str_match)+ str_match <- { %a ('-'? %w)* } (&delim / !.) skip_nonmatch <- !str_match (!delim .)* delim delim <- %s+ ]]

LPeg模式匹配没有连续超量的字符串

1 个答案: