Question

我（非常清楚）不了解Bash。如果这是一个多余的问题，我将为您指出正确的方向－如果找不到合适的线程，我们深表歉意。一如既往地谢谢你。

我的文件结构如下：

/quants
   sample1
      a bunch of extra stuff
      aux_info
         unmapped_names.txt
   sample2
      a bunch of extra stuff
      aux_info
         unmapped_names.txt
   sample3
      a bunch of extra stuff
      aux_info
         unmapped_names.txt

在每个样本子目录中，目录和文件比aux_info和unmapped_names.txt还要多，但这是我感兴趣的复制目录。

下面的方法在/ quants中创建一个未映射的新目录。结果如下：

/quants
   sample1
      a bunch of extra stuff
      aux_info
         unmapped_names.txt
   sample2
      a bunch of extra stuff
      aux_info
         unmapped_names.txt
   sample3
      a bunch of extra stuff
      aux_info
         unmapped_names.txt
   unmapped
      sample1
         unmapped_names.txt
      sample2
         unmapped_names.txt
      sample3
         unmapped_names.txt

下面的代码可以工作，但是非常慢。非常感谢您提出有关如何更有效地执行此操作的建议。

getUnmapped(){
# =====================================================================
# description: create new dir called unmapped
# input: quant filepath (output from mapSalmon)
# output: 
# =====================================================================

# enable glob (for mac)
shopt -s extglob

# store original workingDir
local workingDir=$(pwd)
# store list of all directories (sample_rep names) in quant dir
local sample_dirs=$1/*

# cd to inputted quants dir
cd $1

# create directory in quants dir called unmappped
mkdir unmapped
cd unmapped

# create sample_rep directories in unmapped
for sample_rep in $sample_dirs;
  do
    if [ $(basename ${sample_rep%_quant}) != "unmapped" ]
      then
        local sample_file=$(basename ${sample_rep%_quant})
        mkdir $sample_file
        cp $sample_rep/aux_info/unmapped_names.txt ${1}/unmapped/${sample_file}
      fi
  done

cd $workingDir

} # end getUnmapped

Answer 1

您要处理多少个文件？

您可以做的一件事是预先计算并存储表达式：

$（基本名称$ {sample_rep％_quant}）

像这样：

sample_file = $（基本名称$ {sample_rep％_quant}）

然后在代码中将表达式替换为$ sample_file。这将使您不必两次评估表达式。但是我不认为这就是为什么它运行缓慢的原因，因为性能可能受到Mac文件系统I / O的限制。

对于800MB的大型文件，复制速度会很慢。在这种情况下，使用'ln -s ...'进行符号链接会更快。

Answer 2

您可能希望使用内置了并行性的编程语言来执行此操作。否则，可以使用并行命令：https://unix.stackexchange.com/questions/211976/how-to-run-x-instances-of-a-script-parallel

我不确定我自己涉及的复杂性，但是它至少应该开始最大限度地利用资源。您可以使用Brew在Mac上并行安装： https://brew.sh/

复制具有特定文件的目录

2 个答案: