Lua:使用gmatch对字符串进行条件分割

时间:2015-02-07 21:17:52

标签: lua split pcre mud

我在Lua写了一个Mushclient插件。 Mushclient包含一个PCRE mod,允许我使用rex.new函数编译正则表达式。我不确定我是否需要用它来完成我想要做的事情,但我怀疑我可能会这样做但我不愿意。

基本上我希望能够使用分隔符","将字符串拆分成表格。或"和"。但是,在某些情况下,这些“隔离器”会在这些情况下发生。出现在我希望保持不分裂的项目中(即Felix,Cat)。以下是我到目前为止所做的事情:

false_separators = {"Felix, the Cat", "orange and tan cat", "black and white cat"}
separators = rex.new(" ?(.+?)(?:,| and )")
local sample_text = "a black and white cat, a tabby cat, a giant cat, Felix, the Cat and an orange and tan cat."
index = 1
matches = {}
separators:gmatch(sample_text, function (m, t) 
    for k, v in pairs(t) do
          print(v)
          table.insert(matches, v)
    end
 end)

这将输出:

a black
white cat
a tabby cat
a giant cat
Felix
the Cat
an orange

这有两个问题。首先,不包括最后一项。其次,我还没弄清楚如何实现我的false_separators表。我想要的输出是:

a black and white cat
a tabby cat
a giant cat
Felix, the Cat
an orange and tan cat

我可以通过大量的gsubing来做到这一点,但它似乎不优雅,可能是可利用的或缓慢的:

false_separators = {"Felix, the Cat", "orange and tan cat", "black and white cat"}
local sample_text = "a black and white cat, a tabby cat, a giant cat, Felix, the Cat and an orange and tan cat."

function split_cats(text, false_sep)
    for k, v in ipairs(false_sep) do
        text = text:gsub(v, v:gsub(" ", "_")) -- replace spaces in false separator matches with underscores
    end
    text = text:gsub(" and ", ", "):gsub(", ", ";") -- replace ' and ' (that isn't surrounded by underscores) with a comma, then replace all commas that aren't followed by underscores with a semi-colon. Semi-colon is now the true delimiter
    m = utils.split (text, ";") or {} -- split at semi-colon
    for i, v in ipairs(m) do
        m[i] = v:gsub("_", " ") -- remove underscores
    end
    return m
end

table.foreach(split_cats(sample_text, false_separators), print)

输出:

1 a black and white cat
2 a tabby cat
3 a giant cat
4 Felix, the Cat
5 an orange and tan cat.

0 个答案:

没有答案