有效地分裂字符串

时间:2015-02-09 13:20:45

标签: string performance lua split

作为日志解析的结果,我有一个包含主机名和偶尔IP地址的字段。我需要进一步处理该字段中的数据以解析主机名中的域。即如果主机名是googleanalytics.google.com我想尽可能高效地解析google.com,因为系统每秒处理数千条日志消息。

我现在所拥有的是:

-- Save hostname into a temporary variable
local tempMetaValue = hostname

local count = 0
local byte_char = string.byte(".")
for i = 1, #tempMetaValue do
    if string.byte(tempMetaValue, i) == byte_char then
        count = count + 1
    end
end

local dotCount = count

-- If there was only one dot do nothing
if dotCount == 1 then
    return 0

-- Check whether there were more than one dot
elseif dotCount == 2 then
    -- Get the index of the first dot
    local beginIndex = string.find(tempMetaValue,".",1,true)
    -- Get the substring starting after the first dot
    local domainMeta = string.sub(tempMetaValue,beginIndex+1)
    -- Double check that the substring exists
    if domainMeta ~= nil then
        -- Populate the domain meta field
    end
-- If there are more than two dots..
elseif dotCount > 2 then
    -- Test to see if the hostname is actually an IP address
    if tempMetaValue:match("%d%d?%d?%.%d%d?%d?%.%d%d?%d?%.%d%d?%d?") then
        -- Skip the rest if an IP address was found
    end
    -- Get the index of the second to last dot
    local beginIndex = string.find(tempMetaValue,"\.[^\.]*\.[^\.]*$")
    -- Get the substring starting after the second to last dot
    local domainMeta = string.sub(tempMetaValue,beginIndex+1)
    -- Double check that the substring exists
    if domainMeta ~= nil then
        -- Populate the domain meta field
    end
end

我有一种感觉,虽然他可能不是最快的解决方案。 “一种感觉”,因为在此之前我对Lua没有任何经验,但对于这么简单的任务来说似乎太长了。

我尝试创建一个类似于拆分的操作的解决方案。 Java将被执行,并且它将留下最后一个令牌“unsplit”,从而留下我实际想要的部分(域),但是无处可进。因此,基本上对于该解决方案,我想创建尽可能多的令牌,因为主机名值中有点,即googleanalytics.google.com将分为“googleanalytics”和“google.com”。

1 个答案:

答案 0 :(得分:2)

这样的事情能做你想做的吗?

function getdomain(str)
    -- Grad just the last two dotted parts of the string.
    local domain = str:match("%.?([^.]+%.[^.]+)$")
    -- If we have dotted parts and they are all numbers then this is an IP address.
    if domain and tonumber((domain:gsub("%.", ""))) then
        return nil
    end
    return domain
end

print(getdomain("googleanalytics.google.com"))
print(getdomain("foobar.com"))
print(getdomain("1.2.3.4"))
print(getdomain("something.else.longer.than.that.com"))
print(getdomain("foobar"))

这是“它是一个IP地址”测试是非常愚蠢的,应该很可能是一个更强大的测试,但对于服务的快速演示。