我正在尝试运行awk命令来选择具有特定列的特定值范围的.txt文件的几列。我正在使用第10列,其中正面和负面都有距离分数,我想从该文件中选择几列,列数为-1000到1000.我使用了下面的命令,但它没有运行
awk -v OFS='\t' '{if($10 >= -1000 && $10 <= 1000)print $2,$3,$4,$5,$8,$10$16} file.txt > promoter.txt
但它失败了。
我的文件如下所示
PeakID (cmd=S_13_O1_122_K27me3_macs2_out_broad_peaks.c.bed.uniq hg19) Chr Start End Strand Peak Score Focus Ratio/Region Size Annotation Detailed Annotation Distance to TSS Nearest PromoterID Entrez ID Nearest Unigene Nearest Refseq Nearest Ensembl Gene Name Gene Alias Gene Description Gene Type
MACS_peak_5016 chr13 73353448 73357948 + 7673 NA promoter-TSS (NM_006346) promoter-TSS (NM_006346) -1532 NM_006346 10464 Hs.441926 XM_005266229 ENSG00000083535 PIBF1 C13orf24|CEP90|PIBF|RP11-505F3.1 progesterone immunomodulatory binding factor 1 protein-coding
MACS_peak_9676 chr2 10829010 10830914 + 7640 NA exon (NM_024894, exon 1 of 21) exon (NM_024894, exon 1 of 21) 151 NM_024894 79954 Hs.222494 NM_001261392 ENSG00000115761 NOL10 PQBP5 nucleolar protein 10 protein-coding
MACS_peak_3106 chr11 45938540 45940401 + 6981 NA 5' UTR (NM_004813, exon 1 of 11) 5' UTR (NM_004813, exon 1 of 11) 203 NM_057174 9409 Hs.100915 NM_004813 ENSG00000121680 PEX16 PBD8A|PBD8B peroxisomal biogenesis factor 16 protein-coding
MACS_peak_4282 chr12 57984413 57986062 + 6898 NA exon (NM_024779, exon 1 of 10) exon (NM_024779, exon 1 of 10) 296 NM_001146258 79837 Hs.745011 XM_005269152 ENSG00000166908 PIP4K2C PIP5K2C phosphatidylinositol-5-phosphate 4-kinase, type II, gamma protein-coding
MACS_peak_4962 chr13 48667810 48669433 + 6886 NA intron (NM_001270629, intron 1 of 6) L2c|LINE|L2 655 NM_014166 29079 Hs.741275 NM_001270629 ENSG00000136146 MED4 ARC36|DRIP36|HSPC126|TRAP36|VDRIP mediator complex subunit 4 protein-coding
MACS_peak_6695 chr16 28856397 28858825 + 6773 NA 5' UTR (NM_003321, exon 1 of 10) 5' UTR (NM_003321, exon 1 of 10) 118 NM_003321 7284 Hs.12084 NM_003321 ENSG00000178952 TUFM COXPD4|EF-TuMT|EFTU|P43 Tu translation elongation factor, mitochondrial protein-coding
MACS_peak_1985 chr10 14879403 14881608 + 6694 NA promoter-TSS (NM_001029954) promoter-TSS (NM_001029954) 347 NR_103464 51182 Hs.736996 NM_001037538 ENSG00000187522 HSPA14 HSP70-4|HSP70L1 heat shock 70kDa protein 14 protein-coding
MACS_peak_7035 chr16 84219653 84220691 + 6592 NA intron (NM_001243156, intron 1 of 14) AluSz6|SINE|Alu 1504 NM_139353 9013 Hs.153022 XM_005256226 ENSG00000103168 TAF1C MGC:39976|SL1|TAFI110|TAFI95 TATA box binding protein (TBP)-associated factor, RNA polymerase I, C, 110kDa protein-coding
MACS_peak_87 chr1 6613157 6614770 + 6592 NA intron (NM_024654, intron 1 of 11) intron (NM_024654, intron 1 of 11) 694 NM_024654 79707 Hs.59425 XM_005263493 ENSG00000162408 NOL9 Grc3|NET6 nucleolar protein 9 protein-coding
MACS_peak_6893 chr16 67192557 67195235 + 6527 NA promoter-TSS (NM_003789) promoter-TSS (NM_003789) 5 NM_018378 55336 Hs.710714 NM_018378 ENSG00000135722 FBXL8 FBL8 F-box and leucine-rich repeat protein 8 protein-coding
MACS_peak_11932 chr22 22335769 22337385 + 6527 NA intron (NM_001282112, intron 1 of 17) intron (NM_001282112, intron 1 of 17) 663 NM_001282112 8940 Hs.436401 XM_005261810 ENSG00000100038 TOP3B TOP3B1 topoisomerase (DNA) III beta protein-coding
现在上面的输出应该排除超出范围-1000到1000的两行。但是这个命令似乎不起作用。我哪里出错了。
答案 0 :(得分:0)
awk 'BEGIN{ FS = OFS = "\t" } $10 >= -1000 && $10 <= 1000 {print $2,$3,$4,$5,$8,$10$16} file.txt > promoter.txt
您只定义OFS(输出分隔符,但保持输入分隔符(FS)通用,因此使用空格字符)