每天在文件中查找重复条目

时间:2014-03-04 21:20:12

标签: c#

我想编写一个c#代码来读取我的文件,该代码格式如下,并打印每个日期的所有重复条目以及出现次数。

Example.txt:

March 03 2014 abcd March 03 2014 def March 03 2014 abcd March 04 2014 xyz March 04 2014 xyz

输出:

March 03 2014 abcd 2
March 04 2014 xyz 2

有人可以帮我吗?

我正在考虑使用字典,其中事件将是我的密钥,对于每个重复事件,我会增加值。但我不确定如何对每天的结果进行分组。

6 个答案:

答案 0 :(得分:3)

LINQ power可能是个好例子:

var input = "March 03 2014 abcd March 03 2014 def March 03 2014 abcd March 04 2014 xyz March 04 2014 xyz";
var format = "MMMM dd yyyy";

var results = input.Split(' ')
                   .Select((v, i) => new { v, i })
                   .GroupBy(x => x.i / 4, x => x.v, (k, g) => g.ToList())
                   .Select(g => new
                   {
                       Date = DateTime.ParseExact(String.Join(" ", g.Take(3)), format, CultureInfo.InvariantCulture),
                       Event = g[3]
                   })
                   .GroupBy(x => x)
                   .Where(g => g.Count() > 1)
                   .Select(g => new
                   {
                       Item = g.Key,
                       Count = g.Count()
                   });

foreach (var i in results)
    Console.WriteLine("{0} {1} {2}", i.Item.Date.ToString(format), i.Item.Event, i.Count.ToString());

准确打印您需要的内容。

答案 1 :(得分:0)

根据您对问题和示例数据的原始描述,此代码可能会进行一些调整。您也可以使用一些LINQ库来完成它。

        List<String> outputStringList = new List<string>();

        IEnumerable<String> stringEnumerable = System.IO.File.ReadLines(@"c:\tmp\test.txt");

        System.Collections.Generic.HashSet<String> uniqueHashSet = new System.Collections.Generic.HashSet<String>();

        foreach (String line in stringEnumerable) { uniqueHashSet.Add(line); }

        foreach (String output in uniqueHashSet)
        {
            Int32 count = stringEnumerable.Count(element => element == output);

            if (count > 1) { outputStringList.Add(output + " " + count); }
            //if (count > 1) { System.Diagnostics.Debug.WriteLine(output + " " + count); }
        }

我看到你在写我的答案时改变了数据的格式。请忽略,因为此解决方案将不再有效。

答案 2 :(得分:0)

注意:我已将此文写为易于阅读,并附有评论说明该过程。

如果您也是撰写此文件的人,请分隔每个&#34;文件&#34;使用记录分隔符,如果你在ascii表上查看它的值为30.如果不是这种情况,你必须使用OP中给出的文件格式让我知道,我可以添加一个案例。

// Reads in the entire file into one string variable.
string allTheText = File.ReadAllText(string filePath);

// Splits each "file" into a string of its own.
string[] files = allTheText.Split((char)30);

// Do this if you have a newline inbetween each "file" instead of just spaces.
string[] files = File.ReadAllLines(string filePath);

// Make a Dictionary<string, string> to hold all these (you could use DateTime but I opted to not).
Dictionary<string, string> entries = new Dictionary<string, string>();

foreach(string file in files)
{
    // Now lets get the Date of this "file".
    // We need the index of the 3rd space
    var offset = file.IndexOf(' ');
    offset = file.IndexOf(' ', offset+1);
    offset = file.IndexOf(' ', offset+1);

    // Now split up the string by this offset
    string date = file.Substring(0, offset-1);
    string filecont = file.Substring(offset);

    // Only add if it isn't already in there
    if(!entries.Keys.Contains(date))
        entries.Add(date, filecont);
}

// Print them out
foreach(string key in entries)
{
    Console.WriteLine(key + " " + entries[key]);
}

答案 3 :(得分:0)

您可以使用正则表达式拆分文本。

public IEnumerable<KeyValuePair<String, Int32>> SearchDuplicates(string file){
    var file = File.ReadLines(file);
    var pattern = new Regex("[A-Za-z]* [0-9]{2} [0-9]{4} [A-Za-z]*");
    var results = new Dictionary<string, int>();

    foreach(var line in file) {
        foreach(Match match in pattern.Matches(line)) {
            if(!results.ContainsKey(match.Value))
                results.Add(match.Value, 0);
            results[match.Value]++;
        }
    }

    return results.Where(v => v.Value > 1);
}

答案 4 :(得分:0)

使用正则表达式的简单解决方案

string input = "March 03 2014 abcd March 03 2014 def March 03 2014 abcd March 04 2014 xyz March 04 2014 xyz";

List<string> dates = new List<string>();
string[] splitted = input.Split(' ');
for (int i = 0; i < splitted.Length; i = i + 4)
{
    string strDate = splitted[i] + " " + splitted[i + 1] + " " + splitted[i + 2] + " " + splitted[i + 3];

    if (!dates.Contains(strDate))
    {
        dates.Add(strDate);
        if (Regex.Matches(input, strDate).Count > 1)
            Console.WriteLine(strDate + " " + Regex.Matches(input, strDate).Count);
    }
}

答案 5 :(得分:0)

如果需要,您可以根据月分隔符对其进行标记

public static void Main (string[] args)
{
    var str = "March 03 2014 abcd March 03 2014 def March 03 2014 abcd March 04 2014 xyz March 04 2014 xyz";

    var rawResults = tokenize (str).GroupBy(i => i);

    foreach (var item in rawResults) {
        Console.WriteLine ("Item {0} happened {1} times", item.Key, item.Count());
    }
}

static List<String> tokenize (string str)
{
    var months = new[]{ "March", "April", "May" }; //etc

    var strTokens = str.Split (new []{ ' ' }, StringSplitOptions.RemoveEmptyEntries);

    var results = new List<string> ();

    var current = "";
    foreach (var token in strTokens) {
        if (months.Contains(token)) {

            if (current != null && current != "") {
                results.Add (current);
            }

            current = token + " ";
        } else {
            current += token + " ";
        }
    }           

    results.Add (current);

    return results;
}

更好的是,使用解析器组合器来完成它