处理特殊字符

时间:2015-05-20 01:30:00

标签: string character-encoding lua

假设我在Lua mÜ⌠⌠í∩中收到以下字符串,并希望将其应用于我当前的处理代码,如下所示

function inTable(tbl, item)
    for key, value in pairs(tbl) do
        if value == item then return true end
    end
    return false
end
function processstring(instr)
  finmsg = ""
  achar = {131,132,133,134,142,143,145,146,160,166,181,182,183,198,199,224}
  echar = {130,137,138,144,228}
  ichar = {139,140,141,161,173,179,244}
  ochar = {147,148,149,153,162,167,229,233,234,248}
  uchar = {129,150,151,154,163}
  nchar = {164,165,227,252}
  outmsg = string.upper(instr)
  for c in outmsg:gmatch"." do
    bc = string.byte(c)
    if(bc <= 47 or (bc>=58 and bc<=64) or (bc>=91 and bc<=96) or bc >=123)then
    elseif (bc == 52) then finmsg = finmsg.."A"
    elseif (bc == 51) then finmsg = finmsg.."E"
    elseif (bc == 49) then finmsg = finmsg.."I"
    elseif (bc == 48) then finmsg = finmsg.."O"
    elseif (inTable(achar, bc)==true) then finmsg = finmsg.."A"
    elseif (inTable(echar, bc)==true) then finmsg = finmsg.."E"
    elseif (inTable(ichar, bc)==true) then finmsg = finmsg.."I"
    elseif (inTable(ochar, bc)==true) then finmsg = finmsg.."O"
    elseif (inTable(uchar, bc)==true) then finmsg = finmsg.."U"
    elseif (inTable(nchar, bc)==true) then finmsg = finmsg.."N"
    else
    finmsg = finmsg..c
    end
  end
  return finmsg
end
function checkword (instr)
  specword = [[]]
wordlist = {"FIN", "FFI", "PHIN", "PHEN", "FIN", "PHIN", "IFFUM", "MUF", "MEUFEEN", "FEN","FEEN"}
for i, v in ipairs (wordlist) do
  if (string.match(processstring(instr), v) ~= nil)then
    return 1
    end
  end
  --if (string.match(instr,specword) ~= nil)then
  --  return 1
  --end
end
print (checkword("mÜ⌠⌠í∩"))

截至目前,我发现无法证明这样的字符串。即使使用string.byte()将其缩减为ASCII,我也能够可靠地使用像这样的exoctic字符。更奇怪的是,如果我在print(bc)processstring,我会得到以下输出

  

160 226 140 160 195 173 226 136 169

现在,这是一个6字母单词的9个ASCII码,这怎么可能?我构建了引用http://www.asciitable.com/的代码,是不是错了?我该如何处理此处理?

1 个答案:

答案 0 :(得分:1)

local subst = {
   U = "üûùÜú",
   N = "ñÑπⁿ∩",
   O = "ôöòÖóºσΘΩ°",
   I = "ïîìí¡│",
   F = "⌠",
   A = "âäàåÄÅæÆáª╡╢╖╞╟α",
   E = "éëèÉΣ",
}
local subst_utf8 = {}
for base_letter, list_of_letters in pairs(subst) do
    for utf8letter in list_of_letters:gmatch'[%z\1-\x7F\xC0-\xFF][\x80-\xBF]*' do
        subst_utf8[utf8letter] = base_letter
    end
end

function processstring(instr)
  return (instr:upper():gsub('[%z\1-\x7F\xC0-\xFF][\x80-\xBF]*', subst_utf8))
end

print(processstring("mÜ⌠⌠í∩"))  --> MUFFIN