我想知道是否可以用awk或sed在bash中执行此操作。
我有以下示例文件:
HISEQ:272:CB0A0ANXX:3:1112:15781:21284_1:N:0:CATCAC 0 ITR3p_deleted 84279 41 35= * 0 0 TTAAGGAGGCTTCCTTTTCTAAACGATTGGGTGAG JJJ0JIIIIJJJJJJJJJJJJJJJJIJJJIHJJJJ NM:i:0 AM:i:41
HISEQ:272:CB0A0ANXX:3:1115:13546:24638_1:N:0:CATCAC 16 ITR3p_deleted 84279 39 15= * 0 0 TTAAGGAGGCTTCCT BB/FFFF//FBBBBB NM:i:0 AM:i:39
HISEQ:272:CB0A0ANXX:3:1114:4292:31240_1:N:0:CATCAC 16 ITR3p_deleted 83635 45 179= * 0 0 AGATCCTATTAGATACATAGATCCTCGTCGCGATATCGCATTTTCTAACGTGATGGATATATTAA BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJIJJIJJJJJJJJ8JJJJJFFFFFFFFFFFFFFFFFFFFBFFFFFF<FFFFFFFFFFFFFFFFB<<FB<//<< NM:i:0 AM:i:45
HISEQ:272:CB0A0ANXX:3:2104:14047:17929_1:N:0:CATCAC 16 ITR3p_deleted 84274 33 5X120= * 0 0 TAAGGTTAAGGAGGCTTCCTTTTCTAATAATGATATGTATCAATCGGTGTGTAGAAAGTGTTACATCGACTCATAATATTATATTT F7/FFFFBF77///F/7FF/<</</FBF</<<F</B//<//FFFFFFB/F/FBFBF//</F/F</F<<FBBFFFFFFFFFFFF<FFFBFFFFBFF<F<FFFB/F/FBFFFFFFFFFFBFB/</<< NM:i:5 AM:i:33
我想检查第10列的字符串。如果它以前两个示例中的TTAA开头,我想将这些记录提取到file-1中。如果它在TTAA中结束,例如在第三个例子中,我想将其提取到文件-2中。第四条记录将被忽略。
似乎无法找到与awk匹配的字符串。
感谢。
答案 0 :(得分:3)
尝试,继续。
awk '$10 ~ /^TTAA/{print > "file-1";next} $10 ~ /TTAA$/{print > "file-2"}' Input_file
答案 1 :(得分:1)
这应该可以解决问题:
cat samplefile.txt | while read line; do
if [[ $(echo "$line" | awk '{print $10}' | grep '^TTAA') ]]; then
echo "$line" >> file-1.txt
fi
if [[ $(echo "$line" | awk '{print $10}' | grep 'TTAA$') ]]; then
echo "$line" >> file-2.txt
fi
done
答案 2 :(得分:0)
这可能适合你(GNU sed):
#include <stdio.h>
#include <conio.h>
#include <dos.h>
#include <stdlib.h>
#include <graphics.h>
void main(){
int gd = DETECT, gm;
initgraph(&gd,&gm, “D:\\TC\\BGI”);
randomize();
int x = 100;
int y,a;
setfillstyle(2,GREEN);
do{
delay(100);
y = random(150);
if ((200-y) >=a){
setfillstyle(2,BLACK);
bar (x,200-y,x+20,a);
}
else {
bar (x,200,x+20,200-y);
}
a = 200-y;
}
while(!kbhit());
getch();
closegraph();
}
根据正则表达式调用类似于grep的性质并写入单独的文件。
N.B。如果正则表达式匹配,则可以将单行写入两个输出文件。