检查前4个或后4个字符的字符串以匹配字符串

时间:2017-07-26 15:25:24

标签: bash awk sed

我想知道是否可以用awk或sed在bash中执行此操作。

我有以下示例文件:

HISEQ:272:CB0A0ANXX:3:1112:15781:21284_1:N:0:CATCAC 0   ITR3p_deleted   84279   41  35= *   0   0   TTAAGGAGGCTTCCTTTTCTAAACGATTGGGTGAG JJJ0JIIIIJJJJJJJJJJJJJJJJIJJJIHJJJJ NM:i:0  AM:i:41
HISEQ:272:CB0A0ANXX:3:1115:13546:24638_1:N:0:CATCAC 16  ITR3p_deleted   84279   39  15= *   0   0   TTAAGGAGGCTTCCT BB/FFFF//FBBBBB NM:i:0  AM:i:39
HISEQ:272:CB0A0ANXX:3:1114:4292:31240_1:N:0:CATCAC  16  ITR3p_deleted   83635   45  179=    *   0   0   AGATCCTATTAGATACATAGATCCTCGTCGCGATATCGCATTTTCTAACGTGATGGATATATTAA   BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJIJJIJJJJJJJJ8JJJJJFFFFFFFFFFFFFFFFFFFFBFFFFFF<FFFFFFFFFFFFFFFFB<<FB<//<< NM:i:0  AM:i:45
HISEQ:272:CB0A0ANXX:3:2104:14047:17929_1:N:0:CATCAC 16  ITR3p_deleted   84274   33  5X120=  *   0   0   TAAGGTTAAGGAGGCTTCCTTTTCTAATAATGATATGTATCAATCGGTGTGTAGAAAGTGTTACATCGACTCATAATATTATATTT  F7/FFFFBF77///F/7FF/<</</FBF</<<F</B//<//FFFFFFB/F/FBFBF//</F/F</F<<FBBFFFFFFFFFFFF<FFFBFFFFBFF<F<FFFB/F/FBFFFFFFFFFFBFB/</<<   NM:i:5  AM:i:33

我想检查第10列的字符串。如果它以前两个示例中的TTAA开头,我想将这些记录提取到file-1中。如果它在TTAA中结束,例如在第三个例子中,我想将其提取到文件-2中。第四条记录将被忽略。

似乎无法找到与awk匹配的字符串。

感谢。

3 个答案:

答案 0 :(得分:3)

尝试,继续。

awk '$10 ~ /^TTAA/{print > "file-1";next} $10 ~ /TTAA$/{print > "file-2"}'  Input_file

答案 1 :(得分:1)

这应该可以解决问题:

cat samplefile.txt | while read line; do
  if [[ $(echo "$line" | awk '{print $10}' | grep '^TTAA')  ]]; then
    echo "$line" >> file-1.txt
  fi
  if [[ $(echo "$line" | awk '{print $10}' | grep 'TTAA$') ]]; then
    echo "$line" >> file-2.txt
  fi     
done

答案 2 :(得分:0)

这可能适合你(GNU sed):

#include <stdio.h>
#include <conio.h>
#include <dos.h>
#include <stdlib.h>
#include <graphics.h>

void main(){
    int gd = DETECT, gm;
    initgraph(&gd,&gm, “D:\\TC\\BGI”);
    randomize();
    int x = 100;
    int y,a; 
    setfillstyle(2,GREEN);

    do{
        delay(100);
        y = random(150);
        if ((200-y) >=a){ 
            setfillstyle(2,BLACK);
            bar (x,200-y,x+20,a); 
        } 
        else { 
            bar (x,200,x+20,200-y); 
        }  
        a = 200-y; 
    }
    while(!kbhit());
    getch(); 
    closegraph(); 
}

根据正则表达式调用类似于grep的性质并写入单独的文件。

N.B。如果正则表达式匹配,则可以将单行写入两个输出文件。