我有一个程序(抱歉更改这不是一个选项),输出的行程超过500k行。
我正在尝试根据行中的子字符串将日志文件中的行组合(然后对这些组进行排序)
例如,我的行与下面类似:
SELECT something WHERE TIM BETWEEN '*' AND '*' AND something;
我想要分组的是TIM BETWEEN '*' AND '*'
,其中*在行之间匹配,例如:
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
将在输出中分组:
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
每个组也已经根据整个字符串进行了排序,所以“somethings”是相似的,彼此相邻?
我一直在尝试将shell脚本放在一起输出我想从日志文件中读取的内容,但是没有取得任何成功!
编辑:我还需要提一下'某事'可以是多个单词,例如:
SELECT blah1, blah2 or SELECT blah1, blah2, blah3
答案 0 :(得分:1)
您应该可以使用排序
sort -o outputfile +1 -2 +4 -5 +6 -7 inputfile
其中+1 -2给出“某事”列,+ 4 -5给出第一个日期列,+ 6 -7给出最后一个日期列。
(PS!未经测试)
答案 1 :(得分:0)
您必须预先过滤数据并将其转换为可以sort
使用的数据。
awk '{sub(/BETWEEN/, "|",$0) ;sub(/AND/,"|",$0)}' logFile \
| sort -t"|" +1 -2 +2 -3 \
| sed 's/|/BETWEEN/;s/|/AND/'
输出
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
我希望这会有所帮助。