我有一个文件来自
adaptable adapt:stem<>able:suffix
addiction addict:stem<>ion:suffix
adornment adorn:stem<>ment:suffix
advertisement advertise:stem<>ment:suffix
aggravation aggravate:stem<>ion:suffix
aggregation aggregate:stem<>ion:suffix
agreeable agree:stem<>able:suffix
我需要将其转换为以下格式
(adaptable ((adapt:stem)able:suffix))
(addiction ((addict:stem)ion:suffix))
(adornment ((adorn:stem)ment:suffix))
(advertisement ((advertise:stem)ment:suffix))
(aggravation ((aggravate:stem)ion:suffix)))
(aggregation (aggregate:stem)ion:suffix))
(agreeable ((agree:stem)able:suffix))
where most complex ones are
(imperialistic (((imperialism:stem)ist:suffix)ic:suffix))
我尝试使用awk来做。
这是代码,我在所有行的末尾使用了awk '{print $0")"}' restof120.txt by executing the command it added
)`。
awk '{print "("$0")"}'
我的问题是有没有办法自动转换格式?使用任何包。
有复杂的情况 实例
indecipherable in:prefix<>decipher:stem<>able:suffix
(indecipherable (((in:prefix)decipher:stem)able:suffix))
更新:一些模式,我见过
inactive in:prefix<>active:stem
(inactive ((in:prefix)active:stem))
答案 0 :(得分:2)
在使用复杂的情况编辑之后,我会修改我的sed命令以使用循环:
sed -r -e ':loop' -e 's/([^ ]+)<>/(\1)/' -e 't loop' -e 's/(.* )(.*)/(\1 (\2))/'
它将从右边取代并继续进行,直到替换无法匹配任何东西,因此更换为“无法解释的”#34;测试用例如下:
indecipherable in:prefix<>decipher:stem<>able:suffix # original text
indecipherable (in:prefix<>decipher:stem)able:suffix # after 1st iteration
indecipherable ((in:prefix)decipher:stem)able:suffix # after 2nd iteration
(indecipherable (((in:prefix)decipher:stem)able:suffix)) # after loop: add the outer parentheses
试运行:
$ echo """adaptable adapt:stem<>able:suffix addiction addict:stem<>ion:suffix adornment adorn:stem<>ment:suffix advertisement advertise:stem<>ment:suffix aggravation aggravate:stem<>ion:suffix aggregation aggregate:stem<>ion:suffix agreeable agree:stem<>able:suffix indecipherable in:prefix<>decipher:stem<>able:suffix""" | sed -r -e ':loop' -e 's/([^ ]+)<>/(\1)/' -e 't loop' -e 's/(.* )(.*)/(\1 (\2))/' (adaptable ((adapt:stem)able:suffix)) (addiction ((addict:stem)ion:suffix)) (adornment ((adorn:stem)ment:suffix)) (advertisement ((advertise:stem)ment:suffix)) (aggravation ((aggravate:stem)ion:suffix)) (aggregation ((aggregate:stem)ion:suffix)) (agreeable ((agree:stem)able:suffix)) (indecipherable (((in:prefix)decipher:stem)able:suffix))
我会使用以下sed命令:
sed -r 's/(\w+) (\w+:stem)<>(\w+:suffix)/(\1 ((\2)\3))/'
示例:
$ echo """adaptable adapt:stem<>able:suffix addiction addict:stem<>ion:suffix adornment adorn:stem<>ment:suffix advertisement advertise:stem<>ment:suffix aggravation aggravate:stem<>ion:suffix aggregation aggregate:stem<>ion:suffix agreeable agree:stem<>able:suffix""" | sed -r 's/(\w+) (\w+:stem)<>(\w+:suffix)/(\1 ((\2)\3))/' (adaptable ((adapt:stem)able:suffix)) (addiction ((addict:stem)ion:suffix)) (adornment ((adorn:stem)ment:suffix)) (advertisement ((advertise:stem)ment:suffix)) (aggravation ((aggravate:stem)ion:suffix)) (aggregation ((aggregate:stem)ion:suffix)) (agreeable ((agree:stem)able:suffix))
答案 1 :(得分:2)
awk
救援!
$ awk -F'[ <>]' '{print "(" $1, "((" $2 ")" $4 "))" }' file
(adaptable ((adapt:stem)able:suffix))
(addiction ((addict:stem)ion:suffix))
(adornment ((adorn:stem)ment:suffix))
(advertisement ((advertise:stem)ment:suffix))
(aggravation ((aggravate:stem)ion:suffix))
(aggregation ((aggregate:stem)ion:suffix))
(agreeable ((agree:stem)able:suffix))
对于额外的情况,最好委托给一个函数而不是手动放置括号
$ awk -F'[ <>]' 'function wrap(a) {return "("a")"};
{w=wrap(wrap($2)$4)}
NF>5{w=wrap(w$6)}
{print wrap($1" "w)}' file_with_complex_case
(adaptable ((adapt:stem)able:suffix))
(addiction ((addict:stem)ion:suffix))
(adornment ((adorn:stem)ment:suffix))
(advertisement ((advertise:stem)ment:suffix))
(aggravation (((aggravate:stem)ion:suffix)))
(aggregation (((aggregate:stem)ion:suffix)))
(agreeable (((agree:stem)able:suffix)))
(indecipherable (((in:prefix)decipher:stem)able:suffix))
答案 2 :(得分:1)
这可能是您正在寻找的:
$ cat tst.awk
{
n = gsub(/<>|$/,")",$2)
s = sprintf("%*s",n,"")
gsub(/ /,"(",s)
print "(" $1, s $2 ")"
}
$ awk -f tst.awk file
(adaptable ((adapt:stem)able:suffix))
(addiction ((addict:stem)ion:suffix))
(adornment ((adorn:stem)ment:suffix))
(advertisement ((advertise:stem)ment:suffix))
(aggravation ((aggravate:stem)ion:suffix))
(aggregation ((aggregate:stem)ion:suffix))
(agreeable ((agree:stem)able:suffix))
(indecipherable (((in:prefix)decipher:stem)able:suffix))
答案 3 :(得分:1)
试试这个:
awk -F ' |<>' '{
parts = ""
for (i=2; i<=NF; i++) parts = "(" parts $i ")"
print "(" $1, parts ")"
}' <<END
adaptable adapt:stem<>able:suffix
indecipherable in:prefix<>decipher:stem<>able:suffix
END
(adaptable ((adapt:stem)able:suffix))
(indecipherable (((in:prefix)decipher:stem)able:suffix))
它使用空格或字符串<>
作为字段分隔符(可能需要GNU awk)。它累积了部分以包裹在括号中。