如何仅打印包含第三列的行(Linux)

时间:2018-03-25 18:34:34

标签: linux grep

我在文本文件中有以下行,但想要选择/打印第三列的那些:

SUBSCRIBERIDENTIFIER|234908743|
SUBSCRIBERIDENTIFIER|234909544|
SUBSCRIBERIDENTIFIER|234809956|5008596|   
SUBSCRIBERIDENTIFIER|234809201|
SUBSCRIBERIDENTIFIER|234908513|
SUBSCRIBERIDENTIFIER|234818667|2000010|
SUBSCRIBERIDENTIFIER|234817353|
SUBSCRIBERIDENTIFIER|234817553|
SUBSCRIBERIDENTIFIER|234818966|5008611|   
SUBSCRIBERIDENTIFIER|234817611|2000010|   
SUBSCRIBERIDENTIFIER|234817511|
SUBSCRIBERIDENTIFIER|234909292|

输出如下:

SUBSCRIBERIDENTIFIER|234809956|5008596|   
SUBSCRIBERIDENTIFIER|234818667|2000010|
SUBSCRIBERIDENTIFIER|234818966|5008611|   
SUBSCRIBERIDENTIFIER|234817611|2000010|

我尝试过这个命令,但没有产生预期的结果:

cat DEF01_resultBB.txt | grep "SUBSCRIBERIDENTIFIER"|$3 

3 个答案:

答案 0 :(得分:2)

Try this:

$ grep -E '^([^\|]+\|){3} *$' DEF01_resultBB.txt
SUBSCRIBERIDENTIFIER|234809956|5008596|
SUBSCRIBERIDENTIFIER|234818667|2000010|
SUBSCRIBERIDENTIFIER|234818966|5008611|
SUBSCRIBERIDENTIFIER|234817611|2000010|

Regex is very powerful, you can try it out here: https://regex101.com/r/NZB5GZ/1

Note that some of your lines have extra whitespace at the end, hence the <space>* at the end of the expression.


grep -E means to interpret the pattern as an extended regular expression which is what we have here. If you have GNU grep, you can also use --extended-regexp instead.

A build up of the regex as requested:

  • [^\|] matches any character apart from what is listed inside the square brackets, so excluding |
    • [...] matches any character inside
    • [^...] matches any character that is not inside
    • | has special meaning in some situations in regex, so it's safer to always escape it if you mean a literal | - technically in this situation (in square brackets) the escape is unnecessary
  • [^\|]+ matches the above one-or-more times
  • [^\|]+\| matches any string that does not contain a pipe, but ends with a pipe
  • ([^\|]+\|) produces a match group of the above - important for the next step
  • ([^\|]+\|){3} matches the above exactly-three times
  • ([^\|]+\|){3} * matches the above followed by zero-or-more spaces
    • important as some of your lines have extra spaces on the end
  • ^([^\|]+\|){3} *$ uses the ^ and $ anchors with tie the expression to the beginning and end of the line respectively

答案 1 :(得分:2)

Set the field separator to | and output only the rows containing four columns (fourth column is empty).

awk -F '|' 'NF==4' file

Output:

SUBSCRIBERIDENTIFIER|234809956|5008596|   
SUBSCRIBERIDENTIFIER|234818667|2000010|
SUBSCRIBERIDENTIFIER|234818966|5008611|   
SUBSCRIBERIDENTIFIER|234817611|2000010|

答案 2 :(得分:1)

You can do this with e.g. awk:

awk -F '|' '/SUBSCRIBERIDENTIFIER/ && $3' DEF01_resultBB.txt

Or grep:

grep 'SUBSCRIBERIDENTIFIER|.*|.*|' DEF01_resultBB.txt

From what you've shown of the input, filtering for SUBSCRIBERIDENTIFIER is redundant because it appears in all lines, so you could shorten the above to

awk -F '|' '$3' DEF01_resultBB.txt

and

grep '|.*|.*|' DEF01_resultBB.txt

respectively.

Or you could count | characters and only output lines that have 3 of them:

perl -ne 'print if tr/|// == 3' DEF01_resultBB.txt