我想这是一个非常简单的问题,但我想在将数据导入R之前编辑我的数据。我希望在终端中使用它,以使其适合我的管道。
对于我的数据集中的每一行,如果$ 4> $ 5,我想换掉价值并设置$ 7 =“ - ”。
我正在考虑做一个for循环。在R中,我看起来有点像
for (i in 1:nrow(df)){
while(df[i,4]>df[i,5]){
tmp <- df[i,4]
df[i,4] <- df[i,5]
df[i,5] <- tmp
df[i,7] <- "-"
}
}
那样:
chr1 Cufflinks exon 1 100 . + .
chr1 Cufflinks exon 300 200 . + .
将更改为:
chr1 Cufflinks exon 1 100 . + .
chr1 Cufflinks exon 200 300 . - .
我如何在bash中这样做?
我的数据示例:
chr1 Cufflinks exon 11869 12227 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; gene_name "DDX11L1"; oId "ENST00000456328.2"; nearest_ref "ENST00000456328.2"; class_code "="; tss_id "TSS1";
chr1 Cufflinks exon 12613 12721 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; gene_name "DDX11L1"; oId "ENST00000456328.2"; nearest_ref "ENST00000456328.2"; class_code "="; tss_id "TSS1";
chr1 Cufflinks exon 13221 14409 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "3"; gene_name "DDX11L1"; oId "ENST00000456328.2"; nearest_ref "ENST00000456328.2"; class_code "="; tss_id "TSS1";
chr1 Cufflinks exon 11869 12057 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000005"; exon_number "1"; gene_name "DDX11L1"; oId "CUFF.12.5"; nearest_ref "ENST00000450305.2"; class_code "j"; tss_id "TSS1";
chr1 Cufflinks exon 12179 12227 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000005"; exon_number "2"; gene_name "DDX11L1"; oId "CUFF.12.5"; nearest_ref "ENST00000450305.2"; class_code "j"; tss_id "TSS1";
chr1 Cufflinks exon 12613 12721 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000005"; exon_number "3"; gene_name "DDX11L1"; oId "CUFF.12.5"; nearest_ref "ENST00000450305.2"; class_code "j"; tss_id "TSS1";
chr1 Cufflinks exon 13225 13655 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000005"; exon_number "4"; gene_name "DDX11L1"; oId "CUFF.12.5"; nearest_ref "ENST00000450305.2"; class_code "j"; tss_id "TSS1";
chr1 Cufflinks exon 13661 14412 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000005"; exon_number "5"; gene_name "DDX11L1"; oId "CUFF.12.5"; nearest_ref "ENST00000450305.2"; class_code "j"; tss_id "TSS1";
chr1 Cufflinks exon 11869 12057 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000004"; exon_number "1"; gene_name "DDX11L1"; oId "CUFF.12.4"; nearest_ref "ENST00000450305.2"; class_code "j"; tss_id "TSS1";
chr1 Cufflinks exon 12179 12227 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000004"; exon_number "2"; gene_name "DDX11L1"; oId "CUFF.12.4"; nearest_ref "ENST00000450305.2"; class_code "j"; tss_id "TSS1";
答案 0 :(得分:1)
试试:
awk '{if ($4 > $5) {t=$4; $4=$5; $5=t; $7="-"; print} else {print}}' data
但是,它会破坏列之间的一些空格。不确定这对你来说是不是一个问题。
答案 1 :(得分:0)
使用awk
。
像
awk '{tmp=$4; $4=$5; $5=tmp; $7="-"; print;}' dataset.file