Shell:通过子字符串对字符串进行分组的脚本

时间:2011-05-24 09:06:53

标签: regex shell scripting string-matching

我有一个程序(抱歉更改这不是一个选项),输出的行程超过500k行。

我正在尝试根据行中的子字符串将日志文件中的行组合(然后对这些组进行排序)

例如,我的行与下面类似:

SELECT something WHERE TIM BETWEEN '*' AND '*' AND something;

我想要分组的是TIM BETWEEN '*' AND '*',其中*在行之间匹配,例如:

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;

将在输出中分组:

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;

每个组也已经根据整个字符串进行了排序,所以“somethings”是相似的,彼此相邻?

我一直在尝试将shell脚本放在一起输出我想从日志文件中读取的内容,但是没有取得任何成功!

编辑:我还需要提一下'某事'可以是多个单词,例如:

SELECT blah1, blah2 or SELECT blah1, blah2, blah3

2 个答案:

答案 0 :(得分:1)

您应该可以使用排序

sort -o outputfile +1 -2 +4 -5 +6 -7 inputfile

其中+1 -2给出“某事”列,+ 4 -5给出第一个日期列,+ 6 -7给出最后一个日期列。

(PS!未经测试)

答案 1 :(得分:0)

您必须预先过滤数据并将其转换为可以sort使用的数据。

awk '{sub(/BETWEEN/, "|",$0) ;sub(/AND/,"|",$0)}' logFile \
| sort -t"|" +1 -2 +2 -3 \
| sed 's/|/BETWEEN/;s/|/AND/'

输出

SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2010-03-04' AND '2010-03-10' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;
SELECT something WHERE TIM BETWEEN '2011-01-28' AND '2011-02-05' AND something;

我希望这会有所帮助。