我想编写一个c#代码来读取我的文件,该代码格式如下,并打印每个日期的所有重复条目以及出现次数。
Example.txt:
March 03 2014 abcd March 03 2014 def March 03 2014 abcd March 04 2014 xyz March 04 2014 xyz
输出:
March 03 2014 abcd 2
March 04 2014 xyz 2
有人可以帮我吗?
我正在考虑使用字典,其中事件将是我的密钥,对于每个重复事件,我会增加值。但我不确定如何对每天的结果进行分组。
答案 0 :(得分:3)
LINQ power可能是个好例子:
var input = "March 03 2014 abcd March 03 2014 def March 03 2014 abcd March 04 2014 xyz March 04 2014 xyz";
var format = "MMMM dd yyyy";
var results = input.Split(' ')
.Select((v, i) => new { v, i })
.GroupBy(x => x.i / 4, x => x.v, (k, g) => g.ToList())
.Select(g => new
{
Date = DateTime.ParseExact(String.Join(" ", g.Take(3)), format, CultureInfo.InvariantCulture),
Event = g[3]
})
.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(g => new
{
Item = g.Key,
Count = g.Count()
});
foreach (var i in results)
Console.WriteLine("{0} {1} {2}", i.Item.Date.ToString(format), i.Item.Event, i.Count.ToString());
准确打印您需要的内容。
答案 1 :(得分:0)
根据您对问题和示例数据的原始描述,此代码可能会进行一些调整。您也可以使用一些LINQ库来完成它。
List<String> outputStringList = new List<string>();
IEnumerable<String> stringEnumerable = System.IO.File.ReadLines(@"c:\tmp\test.txt");
System.Collections.Generic.HashSet<String> uniqueHashSet = new System.Collections.Generic.HashSet<String>();
foreach (String line in stringEnumerable) { uniqueHashSet.Add(line); }
foreach (String output in uniqueHashSet)
{
Int32 count = stringEnumerable.Count(element => element == output);
if (count > 1) { outputStringList.Add(output + " " + count); }
//if (count > 1) { System.Diagnostics.Debug.WriteLine(output + " " + count); }
}
我看到你在写我的答案时改变了数据的格式。请忽略,因为此解决方案将不再有效。
答案 2 :(得分:0)
注意:我已将此文写为易于阅读,并附有评论说明该过程。
如果您也是撰写此文件的人,请分隔每个&#34;文件&#34;使用记录分隔符,如果你在ascii表上查看它的值为30.如果不是这种情况,你必须使用OP中给出的文件格式让我知道,我可以添加一个案例。
// Reads in the entire file into one string variable.
string allTheText = File.ReadAllText(string filePath);
// Splits each "file" into a string of its own.
string[] files = allTheText.Split((char)30);
// Do this if you have a newline inbetween each "file" instead of just spaces.
string[] files = File.ReadAllLines(string filePath);
// Make a Dictionary<string, string> to hold all these (you could use DateTime but I opted to not).
Dictionary<string, string> entries = new Dictionary<string, string>();
foreach(string file in files)
{
// Now lets get the Date of this "file".
// We need the index of the 3rd space
var offset = file.IndexOf(' ');
offset = file.IndexOf(' ', offset+1);
offset = file.IndexOf(' ', offset+1);
// Now split up the string by this offset
string date = file.Substring(0, offset-1);
string filecont = file.Substring(offset);
// Only add if it isn't already in there
if(!entries.Keys.Contains(date))
entries.Add(date, filecont);
}
// Print them out
foreach(string key in entries)
{
Console.WriteLine(key + " " + entries[key]);
}
答案 3 :(得分:0)
您可以使用正则表达式拆分文本。
public IEnumerable<KeyValuePair<String, Int32>> SearchDuplicates(string file){
var file = File.ReadLines(file);
var pattern = new Regex("[A-Za-z]* [0-9]{2} [0-9]{4} [A-Za-z]*");
var results = new Dictionary<string, int>();
foreach(var line in file) {
foreach(Match match in pattern.Matches(line)) {
if(!results.ContainsKey(match.Value))
results.Add(match.Value, 0);
results[match.Value]++;
}
}
return results.Where(v => v.Value > 1);
}
答案 4 :(得分:0)
使用正则表达式的简单解决方案
string input = "March 03 2014 abcd March 03 2014 def March 03 2014 abcd March 04 2014 xyz March 04 2014 xyz";
List<string> dates = new List<string>();
string[] splitted = input.Split(' ');
for (int i = 0; i < splitted.Length; i = i + 4)
{
string strDate = splitted[i] + " " + splitted[i + 1] + " " + splitted[i + 2] + " " + splitted[i + 3];
if (!dates.Contains(strDate))
{
dates.Add(strDate);
if (Regex.Matches(input, strDate).Count > 1)
Console.WriteLine(strDate + " " + Regex.Matches(input, strDate).Count);
}
}
答案 5 :(得分:0)
如果需要,您可以根据月分隔符对其进行标记
public static void Main (string[] args)
{
var str = "March 03 2014 abcd March 03 2014 def March 03 2014 abcd March 04 2014 xyz March 04 2014 xyz";
var rawResults = tokenize (str).GroupBy(i => i);
foreach (var item in rawResults) {
Console.WriteLine ("Item {0} happened {1} times", item.Key, item.Count());
}
}
static List<String> tokenize (string str)
{
var months = new[]{ "March", "April", "May" }; //etc
var strTokens = str.Split (new []{ ' ' }, StringSplitOptions.RemoveEmptyEntries);
var results = new List<string> ();
var current = "";
foreach (var token in strTokens) {
if (months.Contains(token)) {
if (current != null && current != "") {
results.Add (current);
}
current = token + " ";
} else {
current += token + " ";
}
}
results.Add (current);
return results;
}
更好的是,使用解析器组合器来完成它