我正在用Regex解析大型日志文件以提取相关数据。我的解析器的运行时间随着文件的大小呈指数增长。当我使用Visual Studio探查器时,它向我显示大部分时间都花在了Regex.Match函数上。如何使我的Regex模式更有效?有没有比Regex更有效的替代方法?
我已经尝试过在匹配行之前从行中删除空格,因此我不花费计算来匹配空格,但这没有任何改善。
这是我目前与行匹配的模式:
cmdStartPattern = new Regex(@"(\d+.\d+):\s+ufshcd_command:\s+(\w+).ufshc:\s+(\w+)_send:\s+tag:\s+(\d+)\s+cmd:\s+(\w+)\s+lba:\s+(\d+)\s+size:\s+(\d+)\s+DB:\s+(\w+)", RegexOptions.Compiled);
cmdDonePattern = new Regex(@"(\d+.\d+):\s+ufshcd_command:\s+(\w+).ufshc:\s+(\w+)_cmpl_*\d*:\s+tag:\s+(\d+)\s+cmd:\s+(\w+)\s+lba:\s+(\d+)\s+size:\s+(\d+)", RegexOptions.Compiled);
cmdBlockPattern = new Regex(@"(\d+.\d+):\s+block_rq_issue:\s+(\d+),(\d+)\s+(\w+)\s+\d+\s+\((.*)\)\s+(\d+)\s+\+\s+(\d+)", RegexOptions.Compiled);
getCurrTimePattern = new Regex(@"(\d+.\d+):", RegexOptions.Compiled);
这些是我试图从中提取数据的日志文件中的几行示例:
<idle>-0 [001] d.h2 228795.291923: ufshcd_command: 1d84000.ufshc: scsi_cmpl: tag: 0 cmd: 0x2a lba: 19733048 size: 4096 DB: 0x0 IS: 0x0
<idle>-0 [001] d.h2 228795.291928: ufshcd_clk_gating: 1d84000.ufshc: state changed to REQ_CLKS_OFF
<idle>-0 [001] ..s1 228795.291950: block_rq_complete: 8,0 WAS () 19733048 + 8 [0]
sh-7199 [002] d..1 228795.318053: block_rq_issue: 8,0 RA 0 () 19692680 + 8 [sh]
sh-7199 [002] d..1 228795.318088: ufshcd_clk_gating: 1d84000.ufshc: state changed to CLKS_ON
sh-7199 [002] d..1 228795.318149: ufshcd_command: 1d84000.ufshc: scsi_send: tag: 0 cmd: 0x28 lba: 19692680 size: 4096 DB: 0x1 IS: 0x0
<idle>-0 [001] d.h2 228795.318822: ufshcd_command: 1d84000.ufshc: scsi_cmpl: tag: 0 cmd: 0x28 lba: 19692680 size: 4096 DB: 0x0 IS: 0x0
<idle>-0 [001] d.h2 228795.318836: ufshcd_clk_gating: 1d84000.ufshc: state changed to REQ_CLKS_OFF
如您所见,有些行是我不需要的,并且小数点前的每一行的开头都是可变的,因此我不能将其包括在Regex模式中。