使用awk格式化文件

时间:2016-03-17 14:00:20

标签: regex awk sed

我有一个文件来自

adaptable adapt:stem<>able:suffix
addiction addict:stem<>ion:suffix
adornment adorn:stem<>ment:suffix
advertisement advertise:stem<>ment:suffix
aggravation aggravate:stem<>ion:suffix
aggregation aggregate:stem<>ion:suffix
agreeable agree:stem<>able:suffix

我需要将其转换为以下格式

(adaptable ((adapt:stem)able:suffix))
(addiction ((addict:stem)ion:suffix))
(adornment ((adorn:stem)ment:suffix))
(advertisement ((advertise:stem)ment:suffix))
(aggravation ((aggravate:stem)ion:suffix)))
(aggregation (aggregate:stem)ion:suffix))
(agreeable ((agree:stem)able:suffix))
where most complex ones are 
(imperialistic (((imperialism:stem)ist:suffix)ic:suffix))

我尝试使用awk来做。 这是代码,我在所有行的末尾使用了awk '{print $0")"}' restof120.txt by executing the command it added)`。

awk '{print "("$0")"}'

我的问题是有没有办法自动转换格式?使用任何包。

有复杂的情况 实例

 indecipherable in:prefix<>decipher:stem<>able:suffix
 (indecipherable (((in:prefix)decipher:stem)able:suffix))

更新:一些模式,我见过

 inactive in:prefix<>active:stem
    (inactive ((in:prefix)active:stem))

4 个答案:

答案 0 :(得分:2)

在使用复杂的情况编辑之后,我会修改我的sed命令以使用循环:

sed -r -e ':loop' -e 's/([^ ]+)<>/(\1)/' -e 't loop' -e 's/(.* )(.*)/(\1 (\2))/'

它将从右边取代并继续进行,直到替换无法匹配任何东西,因此更换为“无法解释的”#34;测试用例如下:

indecipherable in:prefix<>decipher:stem<>able:suffix     # original text
indecipherable (in:prefix<>decipher:stem)able:suffix     # after 1st iteration
indecipherable ((in:prefix)decipher:stem)able:suffix     # after 2nd iteration
(indecipherable (((in:prefix)decipher:stem)able:suffix)) # after loop: add the outer parentheses
  

试运行:

$ echo """adaptable adapt:stem<>able:suffix
addiction addict:stem<>ion:suffix
adornment adorn:stem<>ment:suffix
advertisement advertise:stem<>ment:suffix
aggravation aggravate:stem<>ion:suffix
aggregation aggregate:stem<>ion:suffix
agreeable agree:stem<>able:suffix
indecipherable in:prefix<>decipher:stem<>able:suffix""" | sed -r -e ':loop' -e 's/([^ ]+)<>/(\1)/' -e 't loop' -e 's/(.* )(.*)/(\1 (\2))/'
(adaptable  ((adapt:stem)able:suffix))
(addiction  ((addict:stem)ion:suffix))
(adornment  ((adorn:stem)ment:suffix))
(advertisement  ((advertise:stem)ment:suffix))
(aggravation  ((aggravate:stem)ion:suffix))
(aggregation  ((aggregate:stem)ion:suffix))
(agreeable  ((agree:stem)able:suffix))
(indecipherable  (((in:prefix)decipher:stem)able:suffix))

我会使用以下sed命令:

sed -r 's/(\w+) (\w+:stem)<>(\w+:suffix)/(\1 ((\2)\3))/'
  

示例:

$ echo """adaptable adapt:stem<>able:suffix
addiction addict:stem<>ion:suffix
adornment adorn:stem<>ment:suffix
advertisement advertise:stem<>ment:suffix
aggravation aggravate:stem<>ion:suffix
aggregation aggregate:stem<>ion:suffix
agreeable agree:stem<>able:suffix""" | sed -r 's/(\w+) (\w+:stem)<>(\w+:suffix)/(\1 ((\2)\3))/'
(adaptable ((adapt:stem)able:suffix))
(addiction ((addict:stem)ion:suffix))
(adornment ((adorn:stem)ment:suffix))
(advertisement ((advertise:stem)ment:suffix))
(aggravation ((aggravate:stem)ion:suffix))
(aggregation ((aggregate:stem)ion:suffix))
(agreeable ((agree:stem)able:suffix))

答案 1 :(得分:2)

awk救援!

$ awk -F'[ <>]' '{print "(" $1, "((" $2 ")" $4 "))" }' file

(adaptable ((adapt:stem)able:suffix))
(addiction ((addict:stem)ion:suffix))
(adornment ((adorn:stem)ment:suffix))
(advertisement ((advertise:stem)ment:suffix))
(aggravation ((aggravate:stem)ion:suffix))
(aggregation ((aggregate:stem)ion:suffix))
(agreeable ((agree:stem)able:suffix))

对于额外的情况,最好委托给一个函数而不是手动放置括号

$ awk -F'[ <>]' 'function wrap(a) {return "("a")"}; 
       {w=wrap(wrap($2)$4)} 
   NF>5{w=wrap(w$6)} 
       {print wrap($1" "w)}' file_with_complex_case

(adaptable ((adapt:stem)able:suffix))
(addiction ((addict:stem)ion:suffix))
(adornment ((adorn:stem)ment:suffix))
(advertisement ((advertise:stem)ment:suffix))
(aggravation (((aggravate:stem)ion:suffix)))
(aggregation (((aggregate:stem)ion:suffix)))
(agreeable (((agree:stem)able:suffix)))
(indecipherable (((in:prefix)decipher:stem)able:suffix))

答案 2 :(得分:1)

这可能是您正在寻找的:

$ cat tst.awk
{
    n = gsub(/<>|$/,")",$2)
    s = sprintf("%*s",n,"")
    gsub(/ /,"(",s)
    print "(" $1, s $2 ")"
}

$ awk -f tst.awk file
(adaptable ((adapt:stem)able:suffix))
(addiction ((addict:stem)ion:suffix))
(adornment ((adorn:stem)ment:suffix))
(advertisement ((advertise:stem)ment:suffix))
(aggravation ((aggravate:stem)ion:suffix))
(aggregation ((aggregate:stem)ion:suffix))
(agreeable ((agree:stem)able:suffix))
(indecipherable (((in:prefix)decipher:stem)able:suffix))

答案 3 :(得分:1)

试试这个:

awk -F ' |<>' '{
    parts = ""
    for (i=2; i<=NF; i++) parts = "(" parts $i ")"
    print "(" $1, parts ")"
}' <<END
adaptable adapt:stem<>able:suffix
indecipherable in:prefix<>decipher:stem<>able:suffix
END
(adaptable ((adapt:stem)able:suffix))
(indecipherable (((in:prefix)decipher:stem)able:suffix))

它使用空格或字符串<>作为字段分隔符(可能需要GNU awk)。它累积了部分以包裹在括号中。