我有 2个文本文件,如下所示(像1466786391
这样的大数字是唯一的时间戳):
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 49 packets received, 2% packet loss
round-trip min/avg/max = 20.917/70.216/147.258 ms
1466786342
PING 10.0.0.6 (10.0.0.6): 56 data bytes
....
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 50 packets received, 0% packet loss
round-trip min/avg/max = 29.535/65.768/126.983 ms
1466786391
和此:
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 49 packets received, 2% packet loss
round-trip min/avg/max = 20.917/70.216/147.258 ms
1466786342
PING 10.0.0.6 (10.0.0.6): 56 data bytes
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 50 packets received, 0% packet loss
round-trip min/avg/max = 29.535/65.768/126.983 ms
1466786391
PING 10.0.0.6 (10.0.0.6): 56 data byte
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 44 packets received, 12% packet loss
round-trip min/avg/max = 30.238/62.772/102.959 ms
1466786442
PING 10.0.0.6 (10.0.0.6): 56 data bytes
....
所以第一个文件以timestamp
1466786391 结束,第二个文件在中间某处有相同的数据块,之后有更多数据,特定时间戳之前的数据完全相同作为第一个文件。
所以我想要的输出是:
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 49 packets received, 2% packet loss
round-trip min/avg/max = 20.917/70.216/147.258 ms
1466786342
PING 10.0.0.6 (10.0.0.6): 56 data bytes
....
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 50 packets received, 0% packet loss
round-trip min/avg/max = 29.535/65.768/126.983 ms
1466786391
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 44 packets received, 12% packet loss
round-trip min/avg/max = 30.238/62.772/102.959 ms
1466786442
PING 10.0.0.6 (10.0.0.6): 56 data bytes
....
即,连接这两个文件,并创建第三个文件,删除第二个文件的副本(第一个文件中已存在的文本块。这是我的代码:
public static void UnionFiles()
{
string folderPath = Path.Combine(Path.GetDirectoryName(Assembly.GetEntryAssembly().Location), "http");
string outputFilePath = Path.Combine(Path.GetDirectoryName(Assembly.GetEntryAssembly().Location), "http\\union.dat");
var union = Enumerable.Empty<string>();
foreach (string filePath in Directory
.EnumerateFiles(folderPath, "*.txt")
.OrderBy(x => Path.GetFileNameWithoutExtension(x)))
{
union = union.Union(File.ReadAllLines(filePath));
}
File.WriteAllLines(outputFilePath, union);
}
这是我得到的错误输出(文件结构被破坏):
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 49 packets received, 2% packet loss
round-trip min/avg/max = 20.917/70.216/147.258 ms
1466786342
PING 10.0.0.6 (10.0.0.6): 56 data bytes
--- 10.0.0.6 ping statistics ---
50 packets transmitted, 50 packets received, 0% packet loss
round-trip min/avg/max = 29.535/65.768/126.983 ms
1466786391
round-trip min/avg/max = 30.238/62.772/102.959 ms
1466786442
round-trip min/avg/max = 5.475/40.986/96.964 ms
1466786492
round-trip min/avg/max = 5.276/61.309/112.530 ms
编辑:编写此代码是为了处理多个文件,但即使只有2个文件可以正确完成,我也很高兴。
然而,这并没有像它应该删除textblocks
,它删除了几个有用的行并使输出完全没用。我被卡住了。
如何实现这一目标? 感谢。
答案 0 :(得分:3)
我认为你想要比较块,而不是每行的线。
这样的事情应该有效:
public static void UnionFiles()
{
var firstFilePath = "log1.txt";
var secondFilePath = "log2.txt";
var firstLogBlocks = ReadFileAsLogBlocks(firstFilePath);
var secondLogBlocks = ReadFileAsLogBlocks(secondFilePath);
var cleanLogBlock = firstLogBlocks.Union(secondLogBlocks);
var cleanLog = new StringBuilder();
foreach (var block in cleanLogBlock)
{
cleanLog.Append(block);
}
File.WriteAllText("cleanLog.txt", cleanLog.ToString());
}
private static List<LogBlock> ReadFileAsLogBlocks(string filePath)
{
var allLinesLog = File.ReadAllLines(filePath);
var logBlocks = new List<LogBlock>();
var currentBlock = new List<string>();
var i = 0;
foreach (var line in allLinesLog)
{
if (!string.IsNullOrEmpty(line))
{
currentBlock.Add(line);
if (i == 4)
{
logBlocks.Add(new LogBlock(currentBlock.ToArray()));
currentBlock.Clear();
i = 0;
}
else
{
i++;
}
}
}
return logBlocks;
}
使用日志块定义如下:
public class LogBlock
{
private readonly string[] _logs;
public LogBlock(string[] logs)
{
_logs = logs;
}
public override string ToString()
{
var logBlock = new StringBuilder();
foreach (var log in _logs)
{
logBlock.AppendLine(log);
}
return logBlock.ToString();
}
public override bool Equals(object obj)
{
return obj is LogBlock && Equals((LogBlock)obj);
}
private bool Equals(LogBlock other)
{
return _logs.SequenceEqual(other._logs);
}
public override int GetHashCode()
{
var hashCode = 0;
foreach (var log in _logs)
{
hashCode += log.GetHashCode();
}
return hashCode;
}
}
请注意在LogBlock中重写Equals并使用一致的GetHashCode实现,因为Union使用它们,如here所述。
答案 1 :(得分:1)
使用正则表达式的一个相当hacky的解决方案:
var logBlockPattern = new Regex(@"(^---.*ping statistics ---$)\s+"
+ @"(^.+packets transmitted.+packets received.+packet loss$)\s+"
+ @"(^round-trip min/avg/max.+$)\s+"
+ @"(^\d+$)\s*"
+ @"(^PING.+$)?",
RegexOptions.Multiline);
var logBlocks1 = logBlockPattern.Matches(FileContent1).Cast<Match>().ToList();
var logBlocks2 = logBlockPattern.Matches(FileContent2).Cast<Match>().ToList();
var mergedLogBlocks = logBlocks1.Concat(logBlocks2.Where(lb2 =>
logBlocks1.All(lb1 => lb1.Groups[4].Value != lb2.Groups[4].Value)));
var mergedLogContents = string.Join("\n\n", mergedLogBlocks);
正则表达式Groups
的{{1}}集合包含日志块的每一行(因为在模式中每行包含在parantheses Match
中)并且索引{{ 1}}。因此,索引为()
的匹配组是我们可用于比较日志块的时间戳。
答案 2 :(得分:-2)
在查找唯一记录时存在问题。 你能检查下面的代码吗?
public static void UnionFiles()
{
string folderPath = Path.Combine(Path.GetDirectoryName(Assembly.GetEntryAssembly().Location), "http");
string outputFilePath = Path.Combine(Path.GetDirectoryName(Assembly.GetEntryAssembly().Location), "http\\union.dat");
var union =new List<string>();
foreach (string filePath in Directory
.EnumerateFiles(folderPath, "*.txt")
.OrderBy(x => Path.GetFileNameWithoutExtension(x)))
{
var filter = File.ReadAllLines(filePath).Where(x => !union.Contains(x)).ToList();
union.AddRange(filter);
}
File.WriteAllLines(outputFilePath, union);
}