查找字符串中重复字符序列的有效方法

时间:2019-12-17 16:10:43

标签: c# winforms

我正在尝试执行以下操作:

  1. 将文件内容读入字节数组
  2. 将字节数组转换为Base64字符串
  3. 找到所有长度超过8个的重复字符
  4. 将找到的重复模式放在列表中

这是我目前遇到的一些问题...我目前正在使用此循环读取1MB的文件:

void bkg_DoWork(object sender, DoWorkEventArgs e)
{
    try
    {
        Byte[] bytes = File.ReadAllBytes(this.txt_Filename.Text);
        string file = Convert.ToBase64String(bytes);
        char lastchar = '\0';
        int count = 0;
        List<RepeatingPattern> patterns = new List<RepeatingPattern>();


        this.Invoke((MethodInvoker)delegate
        {
            this.pb_Progress.Maximum = file.Length;
            this.pb_Progress.Value = 0;
            this.lbl_Progress.Text = "Progress: Read file contents read... Looking for patterns! 0% Done...";
        });

        for (int i = 0; i < file.Length; i++)
        {
            this.Invoke((MethodInvoker)delegate
            {
                this.pb_Progress.Value += 1;
                this.lbl_Progress.Text = "Progress: Looking for patterns! " + (int)Decimal.Truncate((decimal)((double)i / file.Length) * 100) + "% Done...";
            });

            if (file[i] == lastchar)
                count += 1;
            else
            {
                //create a pattern, if the count is more than what a pattern's compressed pattern looks like to save space... 8 chars
                //[$a,#$]
                if (count > 8)
                {
                    //create and add a pattern to the list if necessary.
                    RepeatingPattern ptn = new RepeatingPattern(lastchar, count);
                    if (!patterns.Contains(ptn))
                        patterns.Add(ptn);
                }
                count = 0;
                lastchar = file[i];
            }
        }
        e.Result = patterns;
    }
    catch (Exception ex)
    {
        e.Result = ex;
    }
}

但是,当使用此循环时,我发现该过程非常漫长……例如,这个1MB的文件需要大约1分钟的循环时间……在这个时代,感觉就像是这么小的文件很长时间。有没有更有效的方法来执行我想做的事情/找到重复模式?

0 个答案:

没有答案