第二次出现下划线时将忽略数据,应对此进行排序,并消除重复数据。
awk -F_ '{print $2}' file1 >> file 2; sort file1 | uniq ; i tried
****** FROM ********
GGGGGGG DDDDD --> header
XYSER_YURTZ SUMOT_2_058A
XYSER_YURTZ SUMOT_2_058B
XYSER_YURTZ HJRIT_6_51A
XYSER_YURTZ HJRIT_6_51B
XYSER_YURTZ HJRIT_6_51C
XYSER_YURTZ HJRIT_6_51D
XYSER_YURTZ HJRIT_6_51E
XYSER_YURTZ HJRIT_6_51F
XYSER_YURTZ HJRIT_6_520
XYSER_YURTZ HJRIT_6_521
XYSER_GFRE SUMOT_2_16C3
XYSER_GFRE SUMOT_2_16C4
XYSER_GFRE SUMOT_2_16C5
XYSER_GFRE SUMOT_2_16C6
XYSER_GFRE SUMOT_2_16C7
XYSER_GFRE SUMOT_2_16C8
XYSER_GFRE SUMOT_2_16C9
XYSER_GFRE SUMOT_2_16CA
XYSER_GFRE SUMOT_2_16CB
XYSER_GFRE SUMOT_2_16CC
XYSER_GFRE SUMOT_2_16CD
XYSER_GFRE SUMOT_2_16CE
XYSER_GFRE SUMOT_2_16CF
XYSER_GFRE SUMOT_2_16D0
XYSER_GFRE SUMOT_2_16D1
XYSER_GFRE SUMOT_2_16D2
XYSER_GFRE SUMOT_2_16D3
XYSER_GFRE SUMOT_2_16D4
XYSER_GFRE HJRIT_6_12E1
XYSER_GFRE HJRIT_6_12E2
XYSER_GFRE HJRIT_6_12E3
XYSER_GFRE HJRIT_6_12E4
XYSER_GFRE HJRIT_6_12E5
XYSER_GFRE HJRIT_6_12E6
XYSER_GFRE HJRIT_6_12E7
XYSER_GFRE HJRIT_6_12E8
XYSER_GFRE HJRIT_6_12E9
XYSER_GFRE HJRIT_6_12EA
XYSER_GFRE HJRIT_6_12EB
XYSER_GFRE HJRIT_6_12EC
XYSER_GFRE HJRIT_6_12ED
XYSER_ALY1 XYSER_ALY1_0000
XYSER_ALY SUMOT_2_0497
XYSER_ALY SUMOT_2_0498
XYSER_BAP01 SUMOT_2_020E
TO
**************** OUTPUT1 **************
GGGGGGG DDDDD
XYSER_YURTZ SUMOT_2
XYSER_YURTZ HJRIT_6
XYSER_GFRE SUMOT_2
XYSER_GFRE HJRIT_6
XYSER_ALY1 XYSER_ALY1
XYSER_ALY SUMOT_2
XYSER_BAP01 SUMOT_2
XYSER_BAP02 SUMOT_2
************** OUTPUT2 **************
DDDDD GGGGGGG
SUMOT_2 XYSER_YURTZ
SUMOT_2 XYSER_GFRE
SUMOT_2 XYSER_ALY
SUMOT_2 XYSER_BAP01
SUMOT_2 XYSER_BAP02
HJRIT_6 XYSER_YURTZ
HJRIT_6 XYSER_GFRE
XYSER_ALY1 XYSER_ALY1
答案 0 :(得分:0)
通过示例输入,您可以使用
sed 's/_[^_]*$//' inputfile|sort|uniq
这将删除最后一个下划线和所有后续字符。
注意:sort
命令可以将标题放在其他行之间,因为它将按字母数字顺序对整个数据进行排序。在您的示例中,这不是问题,因为标题行GGGGGGG...
将在XYSER_...
之前排序。
如果您知道相似的行已在输入文件中分组,则可以省略排序并使用
sed 's/_[^_]*$//' inputfile|uniq