我的文件名如下:
fastqs/hgmm_100_S1_L001_R1_001.fastq.gz
fastqs/hgmm_100_S1_L002_R1_001.fastq.gz
fastqs/hgmm_100_S1_L003_R1_001.fastq.gz
fastqs/hgmm_100_S1_L001_R2_001.fastq.gz
fastqs/hgmm_100_S1_L002_R2_001.fastq.gz
fastqs/hgmm_100_S1_L003_R2_001.fastq.gz
我想将它们合并到上面显示的组中,以允许合并LXXX。
我可以像下面这样:
cat fastqs/hgmm_100_S1_L00?_R1_001.fastq.gz > data/hgmm_100_S1_R1_001.fastq.gz
cat fastqs/hgmm_100_S1_L00?_R2_001.fastq.gz > data/hgmm_100_S1_R2_001.fastq.gz
但是,这需要我对每个文件组进行硬编码。如何设置它,以便将所有L值合并到一个组中,并输出与输入文件名相同的文件,没有L?
谢谢, 杰克
编辑:
很抱歉没有在原始帖子中包含此内容,但是如果我有类似的内容怎么办?
fastqs/hgmm_100_S1_L001_R1_001.fastq.gz
fastqs/hgmm_100_S1_L002_R1_001.fastq.gz
fastqs/hgmm_100_S1_L003_R1_001.fastq.gz
fastqs/hgmm_200_S1_L001_R2_001.fastq.gz
fastqs/hgmm_200_S1_L002_R2_001.fastq.gz
fastqs/hgmm_200_S1_L003_R2_001.fastq.gz
(只有变化才是开始(100-> 200)
这将如何工作?从本质上讲,我希望合并这些文件,只要名称的所有部分(L除外)即可?是相同的。
答案 0 :(得分:2)
如果模式_L###_
仅存在于文件名的那一部分,则可以尝试如下操作:
#!/usr/bin/env bash
# Define an associative array. Requires bash 4+
declare -A a
# Use extended glob notation. Read the man page or this.
shopt -s extglob
# Collect the file patterns by writing indexes in the array.
for f in fastqs/*_L+([0-9])_*.fastq.gz; do
a["${f/_L+([0-9])_/_*_}"]=1
done
# And finally, gather your files.
for f in "${!a[@]}"; do
# Strip any existing directory part of the filename to build our target
target="data/${f##*/}"
# Concatenate files matching the glob into our intended target
cat $f > "${target/[*]_/}"
done
${!
让我们逐步遍历数组的索引,而不是其值。答案 1 :(得分:0)
您可以即时进行分组。遍历所有文件,并将它们附加到其分组文件中。 *
和?
以有序方式扩展,因此顺序应正确。
cd fastqs
for f in *_L???_*fastq.gz; do
cat "$f" >> "../data/${f/_L???_/_}"
done
cd ..
由于总是附加文件,因此您应先清除data/
目录,然后再运行此命令。