从日志文件中获取所需信息的最佳方法

时间:2011-04-25 07:59:41

标签: c# regex parsing

我的日志文件中有以下行

14:40:21.581 In stopmachine. rtfBoxStop=false,secsGemStop=false,ht=false,error=1007, multiBoxStop=false stoptext=null
14:40:24.144 In stopmachine. rtfBoxStop=false,secsGemStop=false,ht=false,error=1007, multiBoxStop=false stoptext=null
14:40:25.175 MC: <DataContainer>
<EquipmentHeartbeat dateTime="2011-04-06T14:43:21.00+01:00" interval="300" recipeId="ES-AD0109071F-3C-PS.ASP"/>
</DataContainer>
14:40:26.675 In stopmachine. rtfBoxStop=false,secsGemStop=false,ht=false,error=1007, multiBoxStop=false stoptext=null
14:40:29.206 In stopmachine. rtfBoxStop=false,secsGemStop=false,ht=false,error=1007, multiBoxStop=false stoptext=null
14:40:29.675 INFO MobileConnection: Creating new GlobalLLWData completed, Milliseconds used: 0
14:40:30.769 SMDMachine.keepAlive() frequency:18 pr second.
14:40:31.612 INFO McDevicePosition.ReelMethods.GetSplicingChainInformation(): LazyFetch on position 2.2.1 (current ReelID=??)
14:40:31.612 INFO McDevicePosition.ReelMethods.GetSplicingChainInformation(): LazyFetch on position 2.5.1 (current ReelID=??)
14:40:31.612 INFO McDevicePosition.ReelMethods.GetSplicingChainInformation(): LazyFetch on position 2.11.1 (current ReelID=??)
14:40:31.612 INFO McDevicePosition.ReelMethods.GetSplicingChainInformation(): LazyFetch on position 2.13.1 (current ReelID=??)
14:40:31.612 INFO McDevicePosition.ReelMethods.GetSplicingChainInformation(): LazyFetch on position 2.14.1 (current ReelID=??)
14:40:31.612 INFO McDevicePosition.ReelMethods.GetSplicingChainInformation(): LazyFetch on position 2.15.1 (current ReelID=??)
14:40:31.612 INFO McDevicePosition.ReelMethods.GetSplicingChainInformation(): LazyFetch on position 2.17.1 (current ReelID=??)
14:40:31.612 INFO McDevicePosition.ReelMethods.GetSplicingChainInformation(): LazyFetch on position 2.18.1 (current ReelID=??)
14:40:31.737 In stopmachine. rtfBoxStop=false,secsGemStop=false,ht=false,error=1007, multiBoxStop=false stoptext=null
14:40:34.269 In stopmachine. rtfBoxStop=false,secsGemStop=false,ht=false,error=1007, multiBoxStop=false stoptext=null
14:40:36.800 In stopmachine. rtfBoxStop=false,secsGemStop=false,ht=false,error=1007, multiBoxStop=false stoptext=null
14:40:39.326 In stopmachine. rtfBoxStop=false,secsGemStop=false,ht=false,error=1007, multiBoxStop=false stoptext=null
14:40:42.029 In stopmachine. rtfBoxStop=false,secsGemStop=false,ht=false,error=1007, multiBoxStop=false stoptext=null
14:40:44.545 In stopmachine. rtfBoxStop=false,secsGemStop=false,ht=false,error=1007, multiBoxStop=false stoptext=null
14:40:45.764 MC: <DataContainer>
<EquipmentInformation dateTime="2011-04-06T14:43:42.31+01:00" laneList="1" zoneList="1-3" informationId="ReportPPM"><Extensions><ReportPPM dateTimeStart="2011-04-06T14:43:05.31+01:00" dateTimeEnd="2011-04-06T14:43:42.31+01:00" PcbID="1302014560"><Nozzle headId="1-0-PHHA1" numberOfPickAttempts="8" numberOfPlacements="8"/><Nozzle headId="2-0-PHHA1" numberOfPickAttempts="16" numberOfPlacements="16"/></ReportPPM></Extensions></EquipmentInformation>
<EquipmentInformation dateTime="2011-04-06T14:43:42.37+01:00" laneList="1" zoneList="1-3" informationId="ReportPPM"><Extensions><ReportPPM dateTimeStart="2011-04-06T14:43:05.37+01:00" dateTimeEnd="2011-04-06T14:43:42.37+01:00" PcbID="1302014560"><MaterialHandler materialSupplyArea="02" trackId="06" feederDivision="001" materialHandlerId="0206.001" materialHandlerIdAlt="2.6.1" feederType="ITF2_24" numberOfPickAttempts="4" numberOfPlacements="4"/><MaterialHandler materialSupplyArea="02" trackId="09" feederDivision="001" materialHandlerId="0209.001" materialHandlerIdAlt="2.9.1" feederType="ITF2_16" numberOfPickAttempts="4" numberOfPlacements="4"/><MaterialHandler materialSupplyArea="02" trackId="11" feederDivision="001" materialHandlerId="0211.001" materialHandlerIdAlt="2.11.1" feederType="ITF2_12" numberOfPickAttempts="4" numberOfPlacements="4"/><MaterialHandler materialSupplyArea="02" trackId="13" feederDivision="001" materialHandlerId="0213.001" materialHandlerIdAlt="2.13.1" feederType="ITF2_12" numberOfPickAttempts="4" numberOfPlacements="4"/><MaterialHandler materialSupplyArea="02" trackId="14" feederDivision="001" materialHandlerId="0214.001" materialHandlerIdAlt="2.14.1" feederType="ITF2_12" numberOfPickAttempts="4" numberOfPlacements="4"/><MaterialHandler materialSupplyArea="02" trackId="16" feederDivision="001" materialHandlerId="0216.001" materialHandlerIdAlt="2.16.1" feederType="ITF2_16" numberOfPickAttempts="4" numberOfPlacements="4"/></ReportPPM></Extensions></EquipmentInformation>
<EquipmentInformation dateTime="2011-04-06T14:43:42.37+01:00" laneList="1" zoneList="1-3" informationId="ReportPPM"><Extensions><ReportPPM dateTimeStart="2011-04-06T14:43:05.37+01:00" dateTimeEnd="2011-04-06T14:43:42.37+01:00" PcbID="1302014560"><Component componentId="IC0102667" partId="IC0102667F" lotId="" numberOfPickAttempts="4" numberOfPlacements="4"/><Component componentId="IC0102669" partId="IC0102669F" lotId="" numberOfPickAttempts="4" numberOfPlacements="4"/><Component componentId="IC0102665" partId="IC0102665F" lotId="" numberOfPickAttempts="4" numberOfPlacements="4"/><Component componentId="IC0102958" partId="IC0102958F" lotId="" numberOfPickAttempts="4" numberOfPlacements="4"/><Component componentId="IC0102671" partId="IC0102671F" lotId="" numberOfPickAttempts="4" numberOfPlacements="4"/><Component componentId="XT0100253" partId="XT0100253F" lotId="" numberOfPickAttempts="4" numberOfPlacements="4"/></ReportPPM></Extensions></EquipmentInformation>
<EquipmentBlocked dateTime="2011-04-06T14:43:42.37+01:00"/>
<EquipmentChangeState dateTime="2011-04-06T14:43:42.37+01:00" currentState="READY-IDLE-BLOCKED" previousState="READY-PROCESSING-EXECUTING" eventId="EquipmentBlocked"><Extensions currentSEMI-State="SBY/No product/Blocked" previousSEMI-State="PRD/Regular production/Process product"/></EquipmentChangeState>
<EquipmentErrorsCleared dateTime="2011-04-06T14:43:42.37+01:00"/>
<EquipmentChangeState dateTime="2011-04-06T14:43:42.37+01:00" currentState="READY-IDLE-BLOCKED" previousState="READY-IDLE-BLOCKED" eventId="EquipmentErrorsCleared"><Extensions currentSEMI-State="SBY/No product/Blocked" previousSEMI-State="PRD/Regular production/Process product"/></EquipmentChangeState>
<EquipmentAlarmsCleared dateTime="2011-04-06T14:43:42.37+01:00"/>
<EquipmentChangeState dateTime="2011-04-06T14:43:42.37+01:00" currentState="READY-IDLE-BLOCKED" previousState="READY-IDLE-BLOCKED" eventId="EquipmentAlarmsCleared"><Extensions currentSEMI-State="SBY/No product/Blocked" previousSEMI-State="PRD/Regular production/Process product"/></EquipmentChangeState>
<ItemTransferIn dateTime="2011-04-06T14:43:05.91+01:00" itemInstanceId="1302014560" laneId="1"/>
<ItemTransferZone dateTime="2011-04-06T14:43:05.91+01:00" itemInstanceId="1302014560" fromZoneId="1" toZoneId="2" laneId="1"/>
<ItemWorkStart dateTime="2011-04-06T14:43:05.91+01:00" itemInstanceId="1302014560" laneId="1" zoneId="1-3" recipeId="ES-AD0109071F-3C-PS.ASP" orderId="asad"/>
<ItemWorkAbort dateTime="2011-04-06T14:43:42.77+01:00" itemInstanceId="1302014560" laneId="1" zoneId="2" abortId="Incomplete" cycleTime="35922" recipeId="ES-AD0109071F-3C-PS.ASP" orderId="asad"/>
<ItemTransferZone dateTime="2011-04-06T14:43:42.77+01:00" itemInstanceId="1302014560" fromZoneId="2" toZoneId="3" laneId="1"/>
<ItemTransferOut dateTime="2011-04-06T14:43:42.77+01:00" itemInstanceId="1302014560" laneId="1"><Extensions><itemInfo itemTransferInTime="2011-04-06T14:43:05.91+01:00" itemTransferOutTime="2011-04-06T14:43:42.77+01:00" cycleTime="35922" recipeId="ES-AD0109071F-3C-PS.ASP" orderId="asad" itemInstanceId="1302014560" statusId="Incomplete"/></Extensions><Extensions><machineConfig flowlineName="AX-1" machineName="1-1-AX-1_AX201" machineManufacturer="Assembleon" machineModel="AX-201" machineSerial="DC606" machineVersion="n.a." machineSoftwareVersion="3.10_930_26"/></Extensions><Extensions><PlacementSummary totalPlaced="" totalAttempts=""/></Extensions></ItemTransferOut>
<EquipmentStarved dateTime="2011-04-06T14:43:42.84+01:00"/>
<EquipmentChangeState dateTime="2011-04-06T14:43:42.84+01:00" currentState="READY-IDLE-STARVED" previousState="READY-IDLE-BLOCKED" eventId="EquipmentStarved"><Extensions currentSEMI-State="SBY/No product/Starved" previousSEMI-State="SBY/No product/Blocked"/></EquipmentChangeState>
<EquipmentErrorsCleared dateTime="2011-04-06T14:43:42.84+01:00"/>
<EquipmentChangeState dateTime="2011-04-06T14:43:42.84+01:00" currentState="READY-IDLE-STARVED" previousState="READY-IDLE-STARVED" eventId="EquipmentErrorsCleared"><Extensions currentSEMI-State="SBY/No product/Starved" previousSEMI-State="SBY/No product/Blocked"/></EquipmentChangeState>
<EquipmentAlarmsCleared dateTime="2011-04-06T14:43:42.84+01:00"/>
<EquipmentChangeState dateTime="2011-04-06T14:43:42.84+01:00" currentState="READY-IDLE-STARVED" previousState="READY-IDLE-STARVED" eventId="EquipmentAlarmsCleared"><Extensions currentSEMI-State="SBY/No product/Starved" previousSEMI-State="SBY/No product/Blocked"/></EquipmentChangeState>
</DataContainer>
14:40:45.764 Curr State=READY-IDLE-STARVED; Curr SemiState=SBY/No product/Starved; Curr Event Id=EquipmentAlarmsCleared
14:40:45.764 INFO Changing status to WaitBoard

这里最好的方法是将日志文件之间存储的信息和日志文件中的数据作为一些问题来获取所有时间实例?我在日志中有很多回报,我需要抓住所有这些。

我想用indexof字符串函数做这件事,但它太复杂了。 也许在这里使用正则表达式是个好主意? (问题我对正则表达式一无所知)

2 个答案:

答案 0 :(得分:2)

你可以这样做:

static readonly Regex DataContainerRegex =
    new Regex(@"^(\d\d:\d\d:\d\d\.\d\d\d) MC: (<DataContainer>.*?</DataContainer>)",
              RegexOptions.Singleline | RegexOptions.Multiline);

static IEnumerable<Tuple<DateTime, XDocument>> Parse(string data)
{
    var matches = DataContainerRegex.Matches(data);

    return from Match match in matches
           let date = DateTime.Parse(match.Groups[1].Value,
                                     CultureInfo.InvariantCulture)
           let doc = XDocument.Parse(match.Groups[2].Value)
           select Tuple.Create(date, doc);
}

如果您真的想要<DataContainer>而不是XDocument之间的文字,只需将括号直接放在.*?周围并解析第二组。

答案 1 :(得分:0)

这是我尝试解析示例日志文件:

static IEnumerable<Tuple<DateTime, string>> GetLogEntries(string path)
{
    var regex = new Regex(@"^(\d{2}\:\d{2}\:\d{2}\.\d{3}\s)(.*)", RegexOptions.Compiled);

    var logLineBuilder = new StringBuilder();
    Match currentMatch = null;

    foreach(var line in File.ReadLines(path))
    {
        var match = regex.Match(line);

        if (match.Success && logLineBuilder.Length != 0)
        {
            yield return Tuple.Create(DateTime.Parse(currentMatch.Groups[1].Value), logLineBuilder.ToString());
            logLineBuilder.Clear();
        }

        if (match.Success)
            currentMatch = match;

        logLineBuilder.AppendLine(match.Success ? match.Groups[2].Value : line);
    }

    yield return Tuple.Create(DateTime.Parse(currentMatch.Groups[1].Value), logLineBuilder.ToString());
}

我认为日志文件可能很大,所以我使用了File.ReadLines,它会一次读取一行日志文件。这样可以减少内存占用,因为它不必将整个文件读入内存。这可能是特别大的日志文件的问题。然后,您应该能够使用LINQ表达式过滤任何其他条件,例如:

var entries = GetLogEntries("logfile.txt");

var dataContainers = from e in entries
                     where e.Item2.IndexOf("MC: <DataContainer>") != -1
                     select XDocument.Parse(e.Item2.Substring(3));

Console.WriteLine(dataContainers.Count()); //prints "2"