使用'sed'或'awk'转换文本

时间:2012-03-07 21:55:55

标签: sed awk transform

我有一个非常大的输入集,看起来像这样:

Label: foo, Other text: text description...
   <insert label> Item: item description...
   <insert label> Item: item description...
Label: bar, Other text:...
   <insert label> Item:...
Label: baz, Other text:...
   <insert label> Item:...
   <insert label> Item:...
   <insert label> Item:...
...

我想将其转换为拉出标签名称(例如"foo")并将以下行中的标记"<insert label>"替换为实际标签。

Label: foo, Other text: text description...
   foo Item: item description...
   foo Item: item description...
Label: bar, Other text:...
   bar Item:...
Label: baz, Other text:...
   baz Item:...
   baz Item:...
   baz Item:...
...

可以使用sed或awk或其他unix工具完成吗?如果是这样,我该怎么做?

3 个答案:

答案 0 :(得分:5)

这是我的label.awk文件:

/^Label:/ {
    label = $2
    sub(/,$/, "", label)
}

/<insert label>/ {
    sub(/<insert label>/, label)
}

1

要调用:

awk -f label.awk data.txt

答案 1 :(得分:2)

您可以像这样使用awk:

awk '$1=="Label:" {label=$2; sub(/,$/, "", label);} 
     $1=="<insert" && $2=="label>" {$1=" "; $2=label;}
     {print $0;}' file

答案 2 :(得分:2)

使用sed的一个解决方案:

script.sed的内容:

## When line beginning with the 'label' string.
/^Label/ {
    ## Save content to 'hold space'.
    h   

    ## Get the string after the label (removing all other characters)
    s/^[^ ]*\([^,]*\).*$/\1/

    ## Save it in 'hold space' and get the original content
    ## of the line (exchange contents).
    x   

    ## Print and read next line.
    b   
}
###--- Commented this wrong behaviour ---###    
#--- G
#--- s/<[^>]*>\(.*\)\n\(.*\)$/\2\1/

###--- And fixed with this ---###
## When line begins with '<insert label>'
/<insert label>/ {
    ## Append the label name to the line.
    G   

    ## And substitute the '<insert label>' string with it.
    s/<insert label>\(.*\)\n\(.*\)$/\2\1/
}

infile的内容:

Label: foo, Other text: text description...
   <insert label> Item: item description...
   <insert label> Item: item description...
Label: bar, Other text:...
   <insert label> Item:...
Label: baz, Other text:...
   <insert label> Item:...
   <insert label> Item:...
   <insert label> Item:...

像以下一样运行:

sed -f script.sed infile

结果:

Label: foo, Other text: text description...
    foo Item: item description...
    foo Item: item description...
Label: bar, Other text:...
    bar Item:...
Label: baz, Other text:...
    baz Item:...
    baz Item:...
    baz Item:...