Question

我正在解析日志文件并尝试匹配错误语句。我匹配“错误CS”的行的部分将适用于许多重复的一些行而不是。有没有办法我不能返回重复项。使用RegEx的Java风格..

示例：我的简单正则表达式返回

Class1.cs(16,27): error CS0117: 'string' does not contain a definition for 'empty'
Class1.cs(34,20): error CS0103: The name 'thiswworked' does not exist in the current context
Class1.cs(16,27): error CS0117: 'string' does not contain a definition for 'empty'
Class1.cs(34,20): error CS0103: The name 'thiswworked' does not exist in the current context

希望它返回：

Class1.cs(16,27): error CS0117: 'string' does not contain a definition for 'empty'
Class1.cs(34,20): error CS0103: The name 'thiswworked' does not exist in the current context

Answer 1

从技术上讲，使用正则表达式，这是不可能的。你需要更强大的东西。

正则表达式用于匹配常规语言。您尝试匹配的模式不常规。

您需要表达式记住某些“状态”，以前匹配的错误，而正则表达式并不意味着处理这种类型的计算。 Turing Machine能够保存状态。这更符合您的需求。（Java很适合这个法案。）

通过在找到所有错误行后在日志解析器中添加一些额外的逻辑，可以很容易地解决这个问题。

Answer 2

一种解决方案是使用正则表达式进行匹配，然后将该行放入像set这样的数据结构中，以处理为您删除重复项。在解析结束时只打印集合的内容。

如果您担心订单，可以添加到某种类型的地图，其中行作为键，行号作为值（可能在插入之前检查匹配的条目）。如果按值排序，您将获得给定行的第一个实例的列表。

当模式匹配时，正则表达式返回唯一的行

2 个答案: