我有一堆制表符分隔的文本文件,如下所示
"gene_id" "Pattern1" "Pattern2" "Pattern3" "Pattern4" "Pattern5" "MAP" "PPDE"
"ENSG00000119771.13" 3.11528786599051e-18 2.52650109640992e-13 6.25109524320237e-09 0.345846257420197 0.654153736328455 "Pattern5" 1
"ENSG00000123700.4" 1.75016991626305e-36 3.98804090894939e-19 0.63423772228367 3.8159144080782e-21 0.36576227771633 "Pattern3" 1
"ENSG00000128567.15" 1.10722918612618e-23 7.62691311068806e-07 5.77031364194955e-06 5.13675840911147e-21 0.999993466995047 "Pattern5" 1
"ENSG00000130182.6" 9.75717082221716e-22 1.27675651077242e-12 0.469972541094369 1.13677117238758e-12 0.530027458903217 "Pattern5" 1
"ENSG00000131914.9" 3.1627489688037e-41 1.00274706758683e-22 0.0578584524816503 6.98718794692175e-22 0.94214154751835 "Pattern5" 1
现在我想将它们加入到一个文件中,以便我得到
"gene_id" "Pattern5" "Pattern5" "Pattern5" "Pattern5" "Pattern5"
其中每个Pattern5
列来自一个文件。
我用
尝试了一些东西cut -f 6 <file>
和
paste <file1> <file2> ...
但我无法正确组合。
谢谢你的帮助!
更新 我试着在这里给你一个可测试的例子作为输入:
<file1>
gene_id Pattern1 Pattern2 Pattern3 Pattern4 Pattern5
ENSG00000119771 1 2 3 4 5
ENSG00000123700 1 2 3 4 5
ENSG00000128567 1 2 3 4 5
ENSG00000130182 1 2 3 4 5
ENSG00000131914 1 2 3 4 5
<file2>
gene_id Pattern1 Pattern2 Pattern3 Pattern4 Pattern5
ENSG00000119771 6 7 8 9 10
ENSG00000123700 6 7 8 9 10
ENSG00000128567 6 7 8 9 10
ENSG00000130182 6 7 8 9 10
ENSG00000131914 6 7 8 9 10
<file3>
gene_id Pattern1 Pattern2 Pattern3 Pattern4 Pattern5
ENSG00000119771 11 12 13 14 15
ENSG00000123700 11 12 13 14 15
ENSG00000128567 11 12 13 14 15
ENSG00000130182 11 12 13 14 15
ENSG00000131914 11 12 13 14 15
,所需的输出将是
gene_id Pattern5_file1 Pattern5_file2 Pattern5_file3
ENSG00000119771 5 10 15
ENSG00000123700 5 10 15
ENSG00000128567 5 10 15
ENSG00000130182 5 10 15
ENSG00000131914 5 10 15
UPDATE2: 我尝试了Ed Morton的方法:
awk '
BEGIN { FS=OFS="\t" } FNR==1{ARGIND++}
{ genes[$1]; val[$1,ARGIND] = $5 }
END {
for (gene in genes) {
printf "%s%s", gene, OFS
for (file=1; file<=ARGIND; file++) {
printf "%s%s", val[gene,file], (file<ARGIND?OFS:ORS)
}
}
} ' $files
但输出格式不正确:
ENSG00000128567 4 9 14
ENSG00000130182 4 9 14
ENSG00000119771 4 9 14
gene_id Pattern4 Pattern4 Pattern4
ENSG00000131914 4 9 14
ENSG00000123700 4 9 14
答案 0 :(得分:2)
尝试这个
#!/bin/bash
paste file1 file2 file3 | awk -v patternIdx=6 '
function printPattern(idx, isFirstLine) {
for (i = 1; i <= NF; ++i) {
if (i == 1)
printf "%s ", $i;
else if (isFirstLine && i % patternIdx == 0)
printf "%s_file%d ", $i, i / patternIdx;
else if (i % patternIdx == 0)
printf "%d ", $i;
}
printf "\n"
}
{
if (NR == 1)
printPattern(patternIdx, 1);
else
printPattern(patternIdx, 0);
}'
patternIdx是Pattern5
答案 1 :(得分:1)
for f in file1 file2 file3; do
cut -f 6 $f; done |
awk '{if ($1~/Pattern5/) {printf("\n%s\t",$1)} else {printf("%s\t",$1)} };END{print ""}' |
tail -n +2
&#34; Pattern5&#34; 0.654153736328455 0.36576227771633 0.999993466995047
&#34; Pattern5&#34; 0.654153736328455 0.36576227771633 0.999993466995047
&#34; Pattern5&#34; 0.654153736328455 0.36576227771633 0.999993466995047
(我刚刚使用了与file1-3相同的数据。)
您也可以使用glob指定输入文件(如果它们是经常命名的),例如: for f in myfiles*
。