bash:按表达式交换行的值

时间:2015-10-21 21:40:03

标签: bash

我想这是一个非常简单的问题,但我想在将数据导入R之前编辑我的数据。我希望在终端中使用它,以使其适合我的管道。

对于我的数据集中的每一行,如果$ 4> $ 5,我想换掉价值并设置$ 7 =“ - ”。

我正在考虑做一个for循环。在R中,我看起来有点像

for (i in 1:nrow(df)){
    while(df[i,4]>df[i,5]){
        tmp <- df[i,4]
        df[i,4] <- df[i,5]
        df[i,5] <- tmp
        df[i,7] <- "-"
    }
}

那样:

chr1    Cufflinks   exon    1       100     .   +   .
chr1    Cufflinks   exon    300     200     .   +   .   

将更改为:

chr1    Cufflinks   exon    1       100     .   +   .
chr1    Cufflinks   exon    200     300     .   -   .   

我如何在bash中这样做?

我的数据示例:

chr1    Cufflinks   exon    11869   12227   .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; gene_name "DDX11L1"; oId "ENST00000456328.2"; nearest_ref "ENST00000456328.2"; class_code "="; tss_id "TSS1";
chr1    Cufflinks   exon    12613   12721   .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; gene_name "DDX11L1"; oId "ENST00000456328.2"; nearest_ref "ENST00000456328.2"; class_code "="; tss_id "TSS1";
chr1    Cufflinks   exon    13221   14409   .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "3"; gene_name "DDX11L1"; oId "ENST00000456328.2"; nearest_ref "ENST00000456328.2"; class_code "="; tss_id "TSS1";
chr1    Cufflinks   exon    11869   12057   .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000005"; exon_number "1"; gene_name "DDX11L1"; oId "CUFF.12.5"; nearest_ref "ENST00000450305.2"; class_code "j"; tss_id "TSS1";
chr1    Cufflinks   exon    12179   12227   .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000005"; exon_number "2"; gene_name "DDX11L1"; oId "CUFF.12.5"; nearest_ref "ENST00000450305.2"; class_code "j"; tss_id "TSS1";
chr1    Cufflinks   exon    12613   12721   .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000005"; exon_number "3"; gene_name "DDX11L1"; oId "CUFF.12.5"; nearest_ref "ENST00000450305.2"; class_code "j"; tss_id "TSS1";
chr1    Cufflinks   exon    13225   13655   .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000005"; exon_number "4"; gene_name "DDX11L1"; oId "CUFF.12.5"; nearest_ref "ENST00000450305.2"; class_code "j"; tss_id "TSS1";
chr1    Cufflinks   exon    13661   14412   .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000005"; exon_number "5"; gene_name "DDX11L1"; oId "CUFF.12.5"; nearest_ref "ENST00000450305.2"; class_code "j"; tss_id "TSS1";
chr1    Cufflinks   exon    11869   12057   .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000004"; exon_number "1"; gene_name "DDX11L1"; oId "CUFF.12.4"; nearest_ref "ENST00000450305.2"; class_code "j"; tss_id "TSS1";
chr1    Cufflinks   exon    12179   12227   .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000004"; exon_number "2"; gene_name "DDX11L1"; oId "CUFF.12.4"; nearest_ref "ENST00000450305.2"; class_code "j"; tss_id "TSS1";

2 个答案:

答案 0 :(得分:1)

试试:

awk '{if ($4 > $5) {t=$4; $4=$5; $5=t; $7="-"; print} else {print}}' data

但是,它会破坏列之间的一些空格。不确定这对你来说是不是一个问题。

答案 1 :(得分:0)

使用awk

awk '{tmp=$4; $4=$5; $5=tmp; $7="-"; print;}' dataset.file