Question

我正在尝试运行awk命令来选择具有特定列的特定值范围的.txt文件的几列。我正在使用第10列，其中正面和负面都有距离分数，我想从该文件中选择几列，列数为-1000到1000.我使用了下面的命令，但它没有运行

awk -v OFS='\t' '{if($10 >= -1000 && $10 <= 1000)print $2,$3,$4,$5,$8,$10$16} file.txt > promoter.txt

但它失败了。

我的文件如下所示

PeakID (cmd=S_13_O1_122_K27me3_macs2_out_broad_peaks.c.bed.uniq hg19)   Chr Start   End Strand  Peak Score  Focus Ratio/Region Size Annotation  Detailed Annotation Distance to TSS Nearest PromoterID  Entrez ID   Nearest Unigene Nearest Refseq  Nearest Ensembl Gene Name   Gene Alias  Gene Description    Gene Type
MACS_peak_5016  chr13   73353448    73357948    +   7673    NA  promoter-TSS (NM_006346)    promoter-TSS (NM_006346)    -1532   NM_006346   10464   Hs.441926   XM_005266229    ENSG00000083535 PIBF1   C13orf24|CEP90|PIBF|RP11-505F3.1    progesterone immunomodulatory binding factor 1  protein-coding
MACS_peak_9676  chr2    10829010    10830914    +   7640    NA  exon (NM_024894, exon 1 of 21)  exon (NM_024894, exon 1 of 21)  151 NM_024894   79954   Hs.222494   NM_001261392    ENSG00000115761 NOL10   PQBP5   nucleolar protein 10    protein-coding
MACS_peak_3106  chr11   45938540    45940401    +   6981    NA  5' UTR (NM_004813, exon 1 of 11)    5' UTR (NM_004813, exon 1 of 11)    203 NM_057174   9409    Hs.100915   NM_004813   ENSG00000121680 PEX16   PBD8A|PBD8B peroxisomal biogenesis factor 16    protein-coding
MACS_peak_4282  chr12   57984413    57986062    +   6898    NA  exon (NM_024779, exon 1 of 10)  exon (NM_024779, exon 1 of 10)  296 NM_001146258    79837   Hs.745011   XM_005269152    ENSG00000166908 PIP4K2C PIP5K2C phosphatidylinositol-5-phosphate 4-kinase, type II, gamma   protein-coding
MACS_peak_4962  chr13   48667810    48669433    +   6886    NA  intron (NM_001270629, intron 1 of 6)    L2c|LINE|L2 655 NM_014166   29079   Hs.741275   NM_001270629    ENSG00000136146 MED4    ARC36|DRIP36|HSPC126|TRAP36|VDRIP   mediator complex subunit 4  protein-coding
MACS_peak_6695  chr16   28856397    28858825    +   6773    NA  5' UTR (NM_003321, exon 1 of 10)    5' UTR (NM_003321, exon 1 of 10)    118 NM_003321   7284    Hs.12084    NM_003321   ENSG00000178952 TUFM    COXPD4|EF-TuMT|EFTU|P43 Tu translation elongation factor, mitochondrial protein-coding
MACS_peak_1985  chr10   14879403    14881608    +   6694    NA  promoter-TSS (NM_001029954) promoter-TSS (NM_001029954) 347 NR_103464   51182   Hs.736996   NM_001037538    ENSG00000187522 HSPA14  HSP70-4|HSP70L1 heat shock 70kDa protein 14 protein-coding
MACS_peak_7035  chr16   84219653    84220691    +   6592    NA  intron (NM_001243156, intron 1 of 14)   AluSz6|SINE|Alu 1504    NM_139353   9013    Hs.153022   XM_005256226    ENSG00000103168 TAF1C   MGC:39976|SL1|TAFI110|TAFI95    TATA box binding protein (TBP)-associated factor, RNA polymerase I, C, 110kDa   protein-coding
MACS_peak_87    chr1    6613157 6614770 +   6592    NA  intron (NM_024654, intron 1 of 11)  intron (NM_024654, intron 1 of 11)  694 NM_024654   79707   Hs.59425    XM_005263493    ENSG00000162408 NOL9    Grc3|NET6   nucleolar protein 9 protein-coding
MACS_peak_6893  chr16   67192557    67195235    +   6527    NA  promoter-TSS (NM_003789)    promoter-TSS (NM_003789)    5   NM_018378   55336   Hs.710714   NM_018378   ENSG00000135722 FBXL8   FBL8    F-box and leucine-rich repeat protein 8 protein-coding
MACS_peak_11932 chr22   22335769    22337385    +   6527    NA  intron (NM_001282112, intron 1 of 17)   intron (NM_001282112, intron 1 of 17)   663 NM_001282112    8940    Hs.436401   XM_005261810    ENSG00000100038 TOP3B   TOP3B1  topoisomerase (DNA) III beta    protein-coding

现在上面的输出应该排除超出范围-1000到1000的两行。但是这个命令似乎不起作用。我哪里出错了。

Answer 1

awk  'BEGIN{ FS = OFS = "\t" } $10 >= -1000 && $10 <= 1000 {print $2,$3,$4,$5,$8,$10$16} file.txt > promoter.txt

您只定义OFS（输出分隔符，但保持输入分隔符（FS）通用，因此使用空格字符）

一个内衬awk选择负值到正值之间的范围？

1 个答案: