Question

我知道有人问过有关按特定列排序文件的类似问题，但似乎没有人回答我的问题。

我的输入文件看起来像

OHJ07_1_contig_10   0   500 130 500 500 1.0000000
OHJ07_1_contig_10   500 1000    180 500 500 1.0000000
OHJ07_1_contig_10   1000    1500    171 500 500 1.0000000
OHJ07_1_contig_10   1500    2000    79  380 500 0.7600000
OHJ07_1_contig_10   2000    2500    62  500 500 1.0000000
OHJ07_1_contig_10   2500    3000    96  500 500 1.0000000
OHJ07_1_contig_10   3000    3500    76  500 500 1.0000000
OHJ07_1_contig_10   3500    4000    87  500 500 1.0000000
OHJ07_1_contig_10   4000    4500    60  500 500 1.0000000
OHJ07_1_contig_10   4500    5000    64  500 500 1.0000000
OHJ07_1_contig_10   5000    5468    213 468 468 1.0000000
OHJ07_1_contig_100  0   500 459 500 500 1.0000000
OHJ07_1_contig_100  500 1000    156 500 500 1.0000000
OHJ07_1_contig_100  1000    1314    77  305 314 0.9713376
OHJ07_1_contig_1000 0   500 239 500 500 1.0000000
OHJ07_1_contig_1000 500 1000    226 500 500 1.0000000
OHJ07_1_contig_1000 1000    1500    238 500 500 1.0000000
OHJ07_1_contig_1000 1500    2000    263 500 500 1.0000000

生成它的程序，根据第一列中的名称按字母顺序排序，但我想根据另一个文件中的名称列表对其进行排序，并保留所有其他数据。另一个文件有其他信息，例如第2列中的重叠群长度（此文件是使用samtools faidx生成的）。

OHJ07_1_contig_25270    888266  96530655    60  61
OHJ07_1_contig_36751    583964  120924448   60  61
OHJ07_1_contig_44057    504884  134192571   60  61
OHJ07_1_contig_21721    415942  87354744    60  61
OHJ07_1_contig_46339    411691  143341916   60  61
OHJ07_1_contig_44022    330441  133783765   60  61

由于每个名称在第一个文件中具有不同数量的条目，因此最简单的处理方法是什么？最好使用bash

我没有尝试过任何事情，因为我根本无法解决这个问题。

Answer 1

我会在每行文件前面加上确定顺序的命令（从现在开始命名索引）及其行号，有一种方法使用awk，我用这里写的答案https://superuser.com/questions/10201/how-can-i-prepend-a-line-number-and-tab-to-each-line-of-a-text-file来做这个（假设你的索引文件名为index，数据文件名为data.txt）：

awk '{printf "%d,%s\n", NR, $0}' < index > index-numbered

通过这种方式，您将在索引编号中找到您决定的任意单词和数字之间的对应关系。然后你可以使用while文件进行排序，用索引行号，逗号和行的其余部分（保留名称）替换每个第一个单词，例如：

57,OHJ07_1_contig_46339    411691  143341916   60  61

通过这种方式，您将能够使用第一个字段（数字）进行排序，该字段以数字顺序翻译您的任意顺序。

创建与上述数字相同的新数据文件的时间：

while read line
do 
   key=$(echo $line | cut -f1)
   n=$(grep $key index-numbered | cut -d, -f1)
   echo $n","$line >> indexed-data.txt
done < data.txt

然后，您只需使用排序并使用插入的行号作为排序键对修改后的数据文件（indexed-data.txt）进行排序：

sort -k1 -n -t, indexed-data.txt >sorted-data.txt

如果要在最终输出中隐藏行号，可以使用以下内容过滤掉每个行号：

sort -k1 -n -t, indexed-data.txt | cut -d, -f2 > sorted-data.txt

您的最终输出将在文件sorted-data.txt。

中

我确信这不是最佳解决方案，也许其他人可以比我更好地回答。

Bash-根据另一个文件中的列表对文件进行排序

1 个答案: