Question

我正在尝试解析文件中日期范围之间的行。但是日期是以非标准方式格式化的。正则表达式是否可以匹配这些？日志文件的格式如下：

Jan  5 11:34:00 log messages here
Jan 13 16:21:00 log messages here
Feb  1 01:14:00 log messages here
Feb 10 16:32:00 more messages
Mar  7 16:32:00 more messages
Apr 21 16:32:00 more messages

例如，如果我想在1月1日到2月10日之间匹配行，我一直无法获得正则表达式以匹配月份订单，因为它们不是数字。

Answer 1

以下shell行可能会成功。假设您希望看到1月'2nd'之后的前41天，那么您可以

echo，date和grep

的渠道

echo {0..41} \
  | xargs -I{} -d ' ' date -d "2018-01-02 + {} days" +"%b %e" \
  | grep -F -f - <logfile>

我相信这是最快的。我们的想法是构建一组可能的日期（这是前两行），然后使用grep搜索它们。

使用awk排序的日志文件：

处理已排序的日志文件时，您可以使用快速返回来限制自己处理唯一需要的分数。

awk -v tstart="Jan  1" -v tend="Feb 10" '
   BEGIN{ month["Jan"]=1; month["Feb"]=2; month["Mar"]=3
          month["Arp"]=4; month["May"]=5; month["Jun"]=6
          month["Jul"]=7; month["Aug"]=8; month["Sep"]=9
          month["Oct"]=10;month["Nov"]=11;month["Dec"]=12
          $0=tstart; ms=$1; ds=$2
          $0=tend  ; me=$1; de=$2
         }
  (month[$1]<month[ms])             { next }
  (month[$1]==month[ms]) && ($2<ds) { next }
  (month[$1]==month[me]) && ($2>de) { exit }
  (month[$1]>month[me])             { exit }
  1' <logfile>

未排序的日志文件awk：

处理未排序的日志文件时，需要主动进行比较。这显然需要更多的时间。

awk -v tstart="Jan  1" -v tend="Feb 10" '
   BEGIN{ month["Jan"]=1; month["Feb"]=2; month["Mar"]=3
          month["Arp"]=4; month["May"]=5; month["Jun"]=6
          month["Jul"]=7; month["Aug"]=8; month["Sep"]=9
          month["Oct"]=10;month["Nov"]=11;month["Dec"]=12
          $0=tstart; ms=$1; ds=$2
          $0=tend  ; me=$1; de=$2
         }
   (ms == me) && ($1 == ms) && (ds<=$2) && ($2<=de) { print; next }
   ($1 == ms) && (ds<=$2)                           { print; next }
   ($1 == me) && ($2<=de)                           { print; next }
   (month[ms]<month[$1]) && (month[$1]<month[me])` <logfile>

以上命令都返回：

Jan  5 11:34:00 log messages here
Jan 13 16:21:00 log messages here
Feb  1 01:14:00 log messages here
Feb 10 16:32:00 more messages

注意：截至12月31日的日期范围可能会导致虚假结果。

正则表达式匹配日志文件自定义日期格式

1 个答案: