如何让我的正则表达式翻译我给它的字符串而不是它自己的输出?

时间:2017-09-15 22:23:54

标签: javascript regex string encryption translation

所以我正在尝试使用正则表达式来创建一种语音密码;将单个字符或小字符组翻译成其他预先指定的单个字符或小组字符。

Example Cipher:

ND = ONZ
ED = ANZ
EE = AAZ
AL = ORTH
IC = AMTH
CH = MAS
FF = UUG
LL = R
OO = UUZ
SS = OOG
TH = ASG
A = OTH
B = YTH
C = M
D = N
E = AZ
F = UG
G = ON
H = S
I = ATH
J = EZ
K = ETH
L = R
M = YZ
N = OZ
O = UZ
P = YN
Q = ITH
R = YG
S = OG
T = AG
U = AN
V = YN
W = L
X = IG
Y = UTH
Z = IZ

我一直遇到的问题是正则表达式翻译它刚刚翻译的字符串,所以THE变成了ANIZAGOGSOOZAZ而不是ASGAZ。到达目的地的过程详述如下 - 资本化的信函是最终产品的原因。

发生了什么:

AN IZ AG OG S O OZ AZ

th = asg
     a = oth
         o = uz
             u = AN
             z = IZ
         t = AG
         h = s
             s = OG
     S
     g = On
         n = OZ
e = AZ

我想要发生什么:

th = ASG
e = AZ

如何阻止正则表达式翻译自己的输出(或者在翻译所有内容之前实际打印)?

这里唯一真正的限制是我需要能够轻松地更改输入和输出值,并创建更多或更少的值。我在Lingojam中使用Javascript Regex(https://lingojam.com/),所以输入实际上是这样的:

/nd/g
/ed/g
/ee/g
/al/g
/ic/g
/ch/g
/ff/g
/ll/g
/oo/g
/ss/g
/th/g
/a/g
/b/g
/c/g
/d/g
/e/g
/f/g
/g/g
/h/g
/i/g
/j/g
/k/g
/l/g
/m/g
/n/g
/o/g
/p/g
/q/g
/r/g
/s/g
/t/g
/u/g
/v/g
/w/g
/x/g
/y/g
/z/g

2 个答案:

答案 0 :(得分:1)

为替换字符串创建字符串哈希Map。使用Map可确保键的顺序。通过使用Map#keysspreading将数据键提取到数组中来创建正则表达式,并使用管道将Array#join提取出来。使用String#replace和回调来编码字符串。

注意:如上所述skirtle in the comments - 正则表达式中键的顺序很重要。 / A | AL /不等于/ AL | A /,你不能依靠'贪婪'来确保较长的匹配优先,它必须在交替中提前。

const hashMap = new Map([["ND","ONZ"],["ED","ANZ"],["EE","AAZ"],["AL","ORTH"],["IC","AMTH"],["CH","MAS"],["FF","UUG"],["LL","R"],["OO","UUZ"],["SS","OOG"],["TH","ASG"],["A","OTH"],["B","YTH"],["C","M"],["D","N"],["E","AZ"],["F","UG"],["G","ON"],["H","S"],["I","ATH"],["J","EZ"],["K","ETH"],["L","R"],["M","YZ"],["N","OZ"],["O","UZ"],["P","YN"],["Q","ITH"],["R","YG"],["S","OG"],["T","AG"],["U","AN"],["V","YN"],["W","L"],["X","IG"],["Y","UTH"],["Z","IZ"]]);

const pattern = new RegExp([...hashMap.keys()].join('|'), 'ig');

const result = 'THE'.replace(pattern, (str) => hashMap.get(str));

console.log(result);

答案 1 :(得分:0)

var tokens = [ ['ND', 'ONZ'], ['ED', 'ANZ'], ['EE', 'AAZ'], ['AL', 'ORTH'], ['IC', 'AMTH'], ['CH', 'MAS'], ['FF', 'UUG'], ['LL', 'R'], ['OO', 'UUZ'], ['SS', 'OOG'], ['TH', 'ASG'], ['A', 'OTH'], ['B', 'YTH'], ['C', 'M'], ['D', 'N'], ['E', 'AZ'], ['F', 'UG'], ['G', 'ON'], ['H', 'S'], ['I', 'ATH'], ['J', 'EZ'], ['K', 'ETH'], ['L', 'R'], ['M', 'YZ'], ['N', 'OZ'], ['O', 'UZ'], ['P', 'YN'], ['Q', 'ITH'], ['R', 'YG'], ['S', 'OG'], ['T', 'AG'], ['U', 'AN'], ['V', 'YN'], ['W', 'L'], ['X', 'IG'], ['Y', 'UTH'], ['Z', 'IZ'] ];

function convert(str) {
  str = str.toUpperCase();
  var index = 0,
      result = "";
  while(index < str.length) {                        // while there is still letters in the string to process
    var tok = tokens.find(function(t) {              // find the next token (the one that matches the string str at the index index)
      return str.indexOf(t[0]) === index;
    });
    if(tok) {                                        // if there is a token
      result += tok[1];                              // add its equivalent string to result
      index += tok[0].length;                        // increment index to point at the next token start
    } else {                                         // if no token was found
      result += str.charAt(index);                   // then skip the current character (the one at the index index)
      index++;                                       // increment index to point ...
    }
  }
  return result;
}

console.log(convert("THE"));