我有一个文件,我尝试使用awk
删除()
之前的文字,但保留()
中的文字。我也试图在_#
之后删除空格和文本,然后输出整行。也许sed
是更好的选择,但我不确定如何。
文件
chr4 100009839 100009851 426_1201_128(ADH5)_1 0 -
chr4 100006265 100006367 426_1202_128(ADH5)_2 0 -
chr4 100003125 100003267 426_1203_128(ADH5)_3 0 -
所需的输出
chr4 100009839 100009851 ADH5_1
chr4 100006265 100006367 ADH5_2
chr4 100003125 100003267 ADH5_3
AWK
awk -F'()_*' '{print $1,$2,$3,$4}' file
答案 0 :(得分:1)
awk -F'[\t()]' '{OFS="\t"; print $1, $2, $3, $5 $6}' file
输出:
chr4 100009839 100009851 ADH5_1 chr4 100006265 100006367 ADH5_2 chr4 100003125 100003267 ADH5_3
答案 1 :(得分:1)
使用带替换的sed:
$ sed 's/[^ ]*(\([^)]*\))\(_[^ ]*\).*$/\1\2/' infile
chr4 100009839 100009851 ADH5_1
chr4 100006265 100006367 ADH5_2
chr4 100003125 100003267 ADH5_3
拆开正则表达式:
[^ ]*( # Non-spaces up to and including opening parenthesis
\( # Start first capture group
[^)]* # Content between parentheses: everything but a closing parenthesis
\) # End of first capture group
) # Closing parenthesis, not captured
\( # Start second capture group
_[^ ]* # Underscore and non-spaces, '_1' etc.
\) # End of second capture group
.*$ # Rest of line, not captured