Question

我正在开发一个Linux集群。我有一个我需要找到的文件列表。

Sample10
Sample22

这些文件具有另一个基于序列号的命名约定。制表符分隔文件key.tsv包含单行列出的两个名称。

Sample10 Serial102
Sample22 Serial120

我需要通过一个名称找到该文件，并使用其他（“Serial”）名称将该文件链接到另一个目录。这是我的尝试。

for i in "Sample10" "Sample22";
do
    if [[ `find /directory/ -name $i*.fastq`]]
    then
    R1=$(find /directory/ -name $i*.fastq);
    ln -s $R1 /output/directory/"$i".fastq;
else
    echo "File existence failed"
    fi
done

这可以从列表中找到感兴趣的文件并将其链接起来，但我很难知道如何根据密钥中的条目重命名它们。

Answer 1

您可以通过一次调用find来实现此目的，同时使用关联数组来保持从key.tsv文件中读取映射信息：

#!/bin/bash

# build the hash for file mapping
declare -A file_map
while read -r src dst; do
  file_map["$src.fastq"]=$dst  # assume that the map doesn't have the .fastq extension
done < key.tsv

# loop through files and rename them
while read -d '' -r path; do   # read the NUL separated output out find
  base=${path##*/}             # get the basename
  dest=${file_map["$base"]}    # look up the hash to get dest name
  [[ $dest ]] || continue      # skip if no mapping was found
  echo "Creating a link for file $path"
  ln -s "$path" "/dest/dir/$dest.fastq"  
done < <(find /path/to/source/files -type f -name '*.fastq' -print0)

我没有测试过这个。很乐意解决您可能遇到的任何问题。

相关：

Answer 2

有很多方法可以做到这一点。 awk是一种方式：

给定的

source dest

目的地= awk '/source/ {print $2}' key.tsv

或者，使用grep并以类似方式剪切

Answer 3

我不是来回答你的作业，所以我会给出一般的想法。

你需要遍历整个tsv。我建议使用python，例如使用这个答案提供的内容：

How to iterate through all the rows in a tsv file?

对于每一行，您必须找到相应的数据（通常一行是一个数组，因此对应的值是LINE [1]）并检查该文件是否存在。在下面，示例代码在bash中执行此操作（在python中查找等效项，也许您可以使用某种exec命令）。

find -name "LINE[0]" -exec rename 's/^LINE[1]_//'

根据映射查找和重命名文件

3 个答案: