在文件中排序和组合相同的文本

时间:2016-02-22 20:45:16

标签: bash

我有bash运行awk,输出如下所示的文件:基本上我正在尝试合并匹配的$2值(-之前的文字)并输出组合的排序文件。所需的输出是我想要获得的排序/组合输出的示例。谢谢 :)。

输入

chr9:101906999-101907185 TPM1-1200|gc=63 281.2
chr2:21245693-21245924 APOB-279|gc=49.8 294.0
chr13:32903545-32903664 BRCA2-318|gc=27.7 30.2
chr19:55667932-55668051 TNNI3-2383|gc=55.5 161.8
chr2:21256161-21256400 APOB-288|gc=46 198.7
chr15:63353044-63353163 TPM1-1200|gc=63 481.2

file_match

TPM1
APOB
BRCA2
TNNI3
APOB
TPM1

当前输出

chr9:101906999-101907185 TPM1-1200|gc=63 281.2
chr2:21245693-21245924 APOB-279|gc=49.8 294.0
chr13:32903545-32903664 BRCA2-318|gc=27.7 30.2
chr19:55667932-55668051 TNNI3-2383|gc=55.5 161.8
chr2:21256161-21256400 APOB-288|gc=46 198.7
chr15:63353044-63353163 TPM1-1200|gc=63 481.2

所需的输出

chr9:101906999-101907185 TPM1-1200|gc=63 281.2
chr15:63353044-63353163 TPM1-1200|gc=63 481.2
chr2:21245693-21245924 APOB-279|gc=49.8 294.0
chr2:21256161-21256400 APOB-288|gc=46 198.7
chr13:32903545-32903664 BRCA2-318|gc=27.7 30.2
chr19:55667932-55668051 TNNI3-2383|gc=55.5 161.8
产生当前输出的

bash

logfile=/home/cmccabe/Desktop/NGS/API/2-12-2015/process.log
for f in /home/cmccabe/Desktop/NGS/API/2-12-2015/bedtools/*base_counts.txt ; do (input)
 echo "Start custom panel creation: $(date) - File: $f"
 bname=$(basename $f)
 pref=${bname%%.txt}
 awk '
 NR == FNR {input[$0]; next}
 {
 split($5, a, "-")
 if (a[1] in input) {
     key = $4 OFS $5
     n[key]++
     sum[key] += $7
 }
 }
 END {
 for (key in n) 
     printf "%s %.1f\n", key, sum[key]/n[key]
}
' file_match $f > /home/cmccabe/Desktop/NGS/API/2-12-2015/bedtools/${pref}_Incidentalcoverage.bed
 echo "End custom panel creation: $(date) - File: $f"
done >> "$logfile"

2 个答案:

答案 0 :(得分:1)

不更改当前脚本(因为没有输入文件来验证正确性),您可以将输出通过管道输出到第二个字段上的前缀。

$ ... | awk '{split($2,a,"-"); print a[1] "\t" $0}' | sort | cut -f2-

chr2:21245693-21245924 APOB-279|gc=49.8 294.0
chr2:21256161-21256400 APOB-288|gc=46 198.7
chr13:32903545-32903664 BRCA2-318|gc=27.7 30.2
chr19:55667932-55668051 TNNI3-2383|gc=55.5 161.8
chr15:63353044-63353163 TPM1-1200|gc=63 481.2
chr9:101906999-101907185 TPM1-1200|gc=63 281.2

答案 1 :(得分:1)

您可以根据版本

将当前脚本传递给此单sort命令
./script.sh | sort -k2,2V

<强>输出:

chr2:21245693-21245924 APOB-279|gc=49.8 294.0
chr2:21256161-21256400 APOB-288|gc=46 198.7
chr13:32903545-32903664 BRCA2-318|gc=27.7 30.2
chr19:55667932-55668051 TNNI3-2383|gc=55.5 161.8
chr15:63353044-63353163 TPM1-1200|gc=63 481.2
chr9:101906999-101907185 TPM1-1200|gc=63 281.2