C# - 读取文本文件,比较字典数据和查找每个字典数据的频率

时间:2017-05-19 09:36:51

标签: c# dictionary

我有一个名为 data.txt 的文本文件,其中包含已替换文字的数据。

data.txt 的内容:

  

第1行:System1 - > MACHINEA

     

第2行:System2 - >到machineB

     

第3行:System3 - >机C

     

第4行:System4 - >加工

     

第4行:System6 - > MachineF

     

第5行:System5 - > MachineE

     

第6行:System6 - > MachineF

     

第7行:System7 - > MachineG

     

第8行:System2 - >到machineB

     

第8行:System8 - > MachineH

static void Main(string[] args)
        {
            String[] arrayofLine = File.ReadAllLines("data.txt");

            Dictionary<string, string> Replaced = new Dictionary<string, string>();
            Dictionary<int, string> Frequency = new Dictionary<int, string>();
            Replaced.Add("System1", "MachineA");
            Replaced.Add("System2", "MachineB");
            Replaced.Add("System3", "MachineC");
            Replaced.Add("System4", "MachineD");
            Replaced.Add("System5", "MachineE");
            Replaced.Add("System6", "MachineF");
            Replaced.Add("System7", "MachineG");
            Replaced.Add("System8", "MachineH");
            int countr = 0;
            for (int i = 0; i < arrayofLine.Length;i++ )
            {
                foreach(var replacement in Replaced.Keys)
                {
                    if (arrayofLine[i].Contains(replacement))
                    {
                        countr++;
                         //if (Frequency.ContainsKey(countr))
                        //{
                        //    Frequency[countr] = Frequency[countr] + "|" + replacement;
                        //}
                        //else
                        //{
                        //    Frequency.Add(countr, replacement);
                        //}
                        Frequency.Add(countr, Convert.ToString(replacement));
                    }
                }

            }


            StringBuilder sbFreq = new StringBuilder();
            foreach(var freq in Frequency)
            {
                sbFreq.AppendLine(string.Format("{0} has been replaced with {1} {2} time(s) ", freq.Value, Replaced[freq.Value], freq.Key));
            }

            Console.Write(sbFreq);

            Console.ReadKey();
        }

字典替换:Replaced.Keys包含原始数据(System1,System2 .... SystemN),Replaced.Values包含替换数据(MachineA,MachineB .... MachineN)< / p>

输出代码:

System1 has been replaced with MachineA 1 time(s)

System2 has been replaced with MachineB 2 time(s)

System3 has been replaced with MachineC 3 time(s)

System4 has been replaced with MachineD 4 time(s)

System6 has been replaced with MachineF 5 time(s)

System5 has been replaced with MachineE 6 time(s)

System6 has been replaced with MachineF 7 time(s)

System7 has been replaced with MachineG 8 time(s)

System2 has been replaced with MachineB 9 time(s)

System8 has been replaced with MachineH 10 time(s)

它计算行数,我想计算原始文本被替换的频率或次数。

期望输出:

System1 has been replaced with MachineA 1 time(s)

System2 has been replaced with MachineB 2 time(s)

System3 has been replaced with MachineC 1 time(s)

System4 has been replaced with MachineD 1 time(s)

System6 has been replaced with MachineF 2 time(s)

System5 has been replaced with MachineE 1 time(s)

System7 has been replaced with MachineG 1 time(s)

System8 has been replaced with MachineH 1 time(s)

如何获得所需的输出?

4 个答案:

答案 0 :(得分:1)

为什么不计算它出现的次数?

首先获取独特的记录:

for (int i = 0; i < arrayofLine.Length; i++)
        {
           //Your original logic here
        }

//This is an additional code:
Frequency = Frequency.GroupBy(s => s.Value)
        .Select(g => g.First())
        .ToDictionary(kvp => kvp.Key, kvp => kvp.Value);  //Get only the distinct records.

StringBuilder sbFreq = new StringBuilder();
foreach (var freq in Frequency)
     {
sbFreq.AppendLine(string.Format("{0} has been replaced with {1} {2} time(s) ",
freq.Value, Replaced[freq.Value], 
arrayofLine.Where(x => x.Contains(freq.Value)).Count())); //Here is the modification part
    }

您将收到所需的输出:

enter image description here

答案 1 :(得分:1)

最简单的答案是将countr声明放入循环并交换循环(Mukesh的答案忘记改变countr的位置)

foreach(var replacement in Replaced.Keys)
{
    //countr will only count occurrences PER INDIVIDUAL REPLACEMENT
    int countr = 0; 

    for (int i = 0; i < arrayofLine.Length;i++ )
    {
         if (arrayofLine[i].Contains(replacement)) countr++;
    }

    Frequency.Add(countr, Convert.ToString(replacement));
}

这是解决您问题的“最简单”解决方案。即用最少的代码更改解决问题。

但是,我想强调使用LINQ有更好的方法来解决这个问题。在常见的迭代情况下,LINQ可以大大简化代码(主要是通过减少嵌套和重复代码)。

使用LINQ,我可以将整个代码段重构为一行:

Frequency = Replaced.ToDictionary(
                            x => x.Key,
                            x => arrayofLine.Count(line => line.Contains(x.Key))
                        );

请注意,我使用的是Dictionary<string, int> Frequency,而不是Dictionary<int,string>(您的版本没有意义,因为多次替换可能会出现相同的数量)。

但是,如果您希望在同一次迭代中也执行实际的字符串替换,则需要更详细。您仍然可以使用LINQ,但需要手动迭代,以便在每个步骤中添加所需的替换逻辑。

这样的事情:

foreach(var replacement in Replaced)
{   
    //Count how often it occurs
    Frequency.Add(
                  replacement.Key, 
                  arrayofLine.Count(line => line.Contains(replacement.Key))
              );

    //And also replac the occurrences!
    for (int i = 0; i < arrayofLine.Length;i++ )
    {
         if (arrayofLine[i].Contains(replacement)) 
             arrayofLine[i] = arrayofLine[i].Replace(replacement.Key, replacement.Value);
    }

}

还有一个潜在的错误
如果单行可以多次包含相同的替换值,则可能应该出现的次数,而不是包含至少一次出现次数的行数。
但如果替换值永远不会发生两次,那么这不是问题。

如果这是一个问题,如果您遇到任何问题,我建议您进行调查并发布新问题。

答案 2 :(得分:0)

只有交换循环才能正常工作

 foreach(var replacement in Replaced.Keys)
     { 
        for (int i = 0; i < arrayofLine.Length;i++ )
                    {

                            if (arrayofLine[i].Contains(replacement))
                            {
                                countr++;
                                 //if (Frequency.ContainsKey(countr))
                                //{
                                //    Frequency[countr] = Frequency[countr] + "|" + replacement;
                                //}
                                //else
                                //{
                                //    Frequency.Add(countr, replacement);
                                //}
                                Frequency.Add(countr, Convert.ToString(replacement));
                            }
                        }

                    }

答案 3 :(得分:0)

代码原样,实际上并不使用字典作为字典,但初始值可能无关紧要,只应解析所有line x: system -> machine组合? (换句话说:文件是否包含不在替换列表中的系统?)

另一种方法是使用正则表达式来获取所有组合:

//arrayofLine = File.ReadAllLines("data.txt");
var rx = new Regex(@"(?:.*:\s*)(\w+)(?:\s*->\s*)(\w+)");
string sFreq = string.Join(Environment.NewLine, from l in arrayofLine
    let m = rx.Match(l)
    where m.Success
    group l by new {From = m.Groups[1].Value, To = m.Groups[2].Value} into g
    select $"{g.Key.From} has been replaced with {g.Key.To} {g.Count()} time(s)"
);

sFreq将根据示例输入包含所需的结果。 请注意这个组中所有唯一的 - &gt;组合,但在示例代码中,每个'from'(系统)似乎设置为相同'to'(机器)。如果只需要检查系统,则可以简化代码(分组)