我在文本文件中有以下行,但想要选择/打印第三列的那些:
SUBSCRIBERIDENTIFIER|234908743|
SUBSCRIBERIDENTIFIER|234909544|
SUBSCRIBERIDENTIFIER|234809956|5008596|
SUBSCRIBERIDENTIFIER|234809201|
SUBSCRIBERIDENTIFIER|234908513|
SUBSCRIBERIDENTIFIER|234818667|2000010|
SUBSCRIBERIDENTIFIER|234817353|
SUBSCRIBERIDENTIFIER|234817553|
SUBSCRIBERIDENTIFIER|234818966|5008611|
SUBSCRIBERIDENTIFIER|234817611|2000010|
SUBSCRIBERIDENTIFIER|234817511|
SUBSCRIBERIDENTIFIER|234909292|
输出如下:
SUBSCRIBERIDENTIFIER|234809956|5008596|
SUBSCRIBERIDENTIFIER|234818667|2000010|
SUBSCRIBERIDENTIFIER|234818966|5008611|
SUBSCRIBERIDENTIFIER|234817611|2000010|
我尝试过这个命令,但没有产生预期的结果:
cat DEF01_resultBB.txt | grep "SUBSCRIBERIDENTIFIER"|$3
答案 0 :(得分:2)
Try this:
$ grep -E '^([^\|]+\|){3} *$' DEF01_resultBB.txt
SUBSCRIBERIDENTIFIER|234809956|5008596|
SUBSCRIBERIDENTIFIER|234818667|2000010|
SUBSCRIBERIDENTIFIER|234818966|5008611|
SUBSCRIBERIDENTIFIER|234817611|2000010|
Regex is very powerful, you can try it out here: https://regex101.com/r/NZB5GZ/1
Note that some of your lines have extra whitespace at the end, hence the <space>*
at the end of the expression.
grep -E
means to interpret the pattern as an extended regular expression which is what we have here. If you have GNU grep, you can also use --extended-regexp
instead.
A build up of the regex as requested:
[^\|]
matches any character apart from what is listed inside the square brackets, so excluding |
[...]
matches any character inside[^...]
matches any character that is not inside|
has special meaning in some situations in regex, so it's safer to always escape it if you mean a literal |
- technically in this situation (in square brackets) the escape is unnecessary[^\|]+
matches the above one-or-more times[^\|]+\|
matches any string that does not contain a pipe, but ends with a pipe([^\|]+\|)
produces a match group of the above - important for the next step([^\|]+\|){3}
matches the above exactly-three times([^\|]+\|){3} *
matches the above followed by zero-or-more spaces
^([^\|]+\|){3} *$
uses the ^
and $
anchors with tie the expression to the beginning and end of the line respectively答案 1 :(得分:2)
Set the field separator to |
and output only the rows containing four columns (fourth column is empty).
awk -F '|' 'NF==4' file
Output:
SUBSCRIBERIDENTIFIER|234809956|5008596| SUBSCRIBERIDENTIFIER|234818667|2000010| SUBSCRIBERIDENTIFIER|234818966|5008611| SUBSCRIBERIDENTIFIER|234817611|2000010|
答案 2 :(得分:1)
You can do this with e.g. awk:
awk -F '|' '/SUBSCRIBERIDENTIFIER/ && $3' DEF01_resultBB.txt
Or grep:
grep 'SUBSCRIBERIDENTIFIER|.*|.*|' DEF01_resultBB.txt
From what you've shown of the input, filtering for SUBSCRIBERIDENTIFIER
is redundant because it appears in all lines, so you could shorten the above to
awk -F '|' '$3' DEF01_resultBB.txt
and
grep '|.*|.*|' DEF01_resultBB.txt
respectively.
Or you could count |
characters and only output lines that have 3 of them:
perl -ne 'print if tr/|// == 3' DEF01_resultBB.txt