正则表达式在一组数字中查找重复出现的数字集

时间:2017-02-07 22:33:53

标签: java regex cryptography xor

给定一组数字,正则表达式可以找到长度为N的数字子集不止一次,最好是在循环变量N上。我目前有一些东西找不到单次出现,但这会返回太多噪声。我希望它在循环中找到长度为N的集合,将N从大集合减少到小。

看似随意的数字序列是转换为数字字符串的字节数组,我想要捕获的集合是XOR编码文件的可能键。

鉴于编码文本足够长,可能存在N个空格与长度为N的密钥xor'd的时间,其以大致明文再现密钥。我测试了这个,例如:

"            " ^ "ThisIsTheKey" produces roughly "tHISiStHEkEY"

当前的正则表达式(java引擎):

    String regex = "(\\d+)\\1";
    Pattern patt = Pattern.compile(regex);
    Matcher matcher = patt.matcher(sToDecode);           
    while (matcher.find())                              
    {               
        System.out.println("Repeated substring: " + matcher.group(1));
    } 

给定: 737568797372696810068791021116868686873696868657376791001117268681067368686868736865736810169686872687972686568689876796869726874749911010194687265796810111086696511099688368688369868984896876708580849586987885681111109978697865767372737668676968796870797899110101110107736868726569697978736868657394707570661101011101079878991101101026968736879686572100736868766968736879686572100736867681107968657210073686876696873687968657210073686876696873687968101110107981007368687669687368796865721007368687669681006872689968796865721007368687669687368796865721007368687673666910772100736868766968736879686572100736868766810011073687968657210073686876696873687767696868711109911010168657210073686876696873687968657210073686876696873687968657210073681111107368796865721007368687669687368796865721007368687669687299110101686572100736868766968736879686572100681056899687968657210073686876696873687968657210073686876696873687310111010772100736868766968736879686572100736868766968737368102111110736879686572100 ... < / P>

这将找到以下reoccuring子集:

...
Repeated substring: 736879686572100736868766968
Repeated substring: 1
Repeated substring: 0
Repeated substring: 68
Repeated substring: 6
Repeated substring: 0
Repeated substring: 68
Repeated substring: 686572100736868766968736879
Repeated substring: 1
Repeated substring: 657210073686876696873687968
...

如果可以更改正则表达式,请告诉我,它只会返回:

Repeated substring: 736879686572100736868766968
Repeated substring: 686572100736868766968736879
Repeated substring: 657210073686876696873687968

1 个答案:

答案 0 :(得分:2)

使用+将匹配一个到多个数字,这就是为什么你得到所有这些短子串。如果您想在长度上添加约束,只需将{n,m}更改为0<=n<m(其中一个可以为空白)。

要获得3个或更多重复数字的组,请使用:

(\d{3,})\1