通过提取两个关键字之间的行来拆分文件

时间:2015-08-04 16:45:12

标签: bash awk csplit

我有一个包含以下行的文件:

SELECT id, review_type FROM stuff; -- 1108 records
SELECT id, review_type FROM stuff WHERE review_type=''; -- 108 records
SELECT id, review_type FROM stuff WHERE review_type!=''; -- 0 records
-- column type is defined as NOT NULL, but just to be sure:
SELECT id, review_type FROM stuff WHERE review_type IS NULL; -- 0 records

其中有5000个string string string MODEL 1 . . . TER string string string MODEL 2 . . . TER 个。我想拆分这个文件,使得每个开始MODEL和结束MODEL X的部分(用点显示)保存到自己的文件中,其他所有部分都被丢弃。我怎样才能做到这一点?可能使用TERawk

我已经检查过其他几个类似的问题,但未能将答案应用于我的案例。

另请注意,我使用的是Mac OS X.

2 个答案:

答案 0 :(得分:2)

你可以使用这个awk:

awk '/^MODEL/{file="model" $2} file{print > file} /^TER/{close(file); file=""}' file

工作原理:

/^MODEL/               # match lines starting with MODEL
file="model" $2        # make variable file as model + model_no from column 2
file{...}              # execute of file variable is set
{print>file}           # print each record to file
/^TER/                 # match lines starting with TER
{close(file); file=""} # close file and reset file to ""

然后验证为:

cat model1
MODEL 1
.
.
.
TER

cat model2
MODEL 2
.
.
.
TER

答案 1 :(得分:1)

即使使用dash

也是如此
go=false text= model_ID=
while IFS= read line; do
    if   [ "`printf "$line" | grep '^MODEL'`" ]; then
        model_ID="`printf "$line" | sed -e 's/^MODEL //'`"
        go=true
    elif [ "`printf "$line" | grep '^TER'`" ];   then
        printf "$text" > "MODEL_$model_ID"
        text=""
        model_ID=""
        go=false
    else
        $go && text="$text$line\n"
    fi
done