Question

我有一个文件，我尝试使用awk删除()之前的文字，但保留()中的文字。我也试图在_#之后删除空格和文本，然后输出整行。也许sed是更好的选择，但我不确定如何。

文件

chr4    100009839   100009851   426_1201_128(ADH5)_1    0   -
chr4    100006265   100006367   426_1202_128(ADH5)_2    0   -
chr4    100003125   100003267   426_1203_128(ADH5)_3    0   -

所需的输出

chr4    100009839   100009851   ADH5_1  
chr4    100006265   100006367   ADH5_2  
chr4    100003125   100003267   ADH5_3

AWK

awk -F'()_*' '{print $1,$2,$3,$4}' file

Answer 1

awk -F'[\t()]' '{OFS="\t"; print $1, $2, $3, $5 $6}' file

输出：

chr4    100009839       100009851       ADH5_1
chr4    100006265       100006367       ADH5_2
chr4    100003125       100003267       ADH5_3

Answer 2

使用带替换的sed：

$ sed 's/[^ ]*(\([^)]*\))\(_[^ ]*\).*$/\1\2/' infile
chr4    100009839   100009851   ADH5_1
chr4    100006265   100006367   ADH5_2
chr4    100003125   100003267   ADH5_3

拆开正则表达式：

[^ ]*(       # Non-spaces up to and including opening parenthesis
\(           # Start first capture group
    [^)]*    # Content between parentheses: everything but a closing parenthesis
\)           # End of first capture group
)            # Closing parenthesis, not captured
\(           # Start second capture group
    _[^ ]*   # Underscore and non-spaces, '_1' etc.
\)           # End of second capture group
.*$          # Rest of line, not captured

awk或sed在字符之前删除文件中的文本，然后在字符之后删除

2 个答案: