如果在另一个文件中找到搜索字符串,则bash删除目录中的文件

时间:2016-05-06 18:48:34

标签: bash

以下bash即将完成。我正在努力的唯一部分是process.log如果找到字符串The bam file is corrupted and has been removed, please check log for reason.,那么.bam中相应的$fbash)就是除去。我补充说:

echo "The bam file is corrupted and has been removed, please check log for reason."
             [[ -f "$f" ]] && rm -f "$f"

试图这样做,但看起来它正在删除最后的.bam(在process.log NA19240.bam中(该文件中包含搜索字符串),但事实并非如此)。而不是process.log中的最后一个.bamNS12911)(即使搜索字符串不存在)。我无法解决这个问题,需要一些专家帮助。我为冗长而道歉发布,只是想添加所有细节。谢谢:)。

的bash

logfile=/home/cmccabe/Desktop/NGS/API/5-4-2016/process.log
for f in /home/cmccabe/Desktop/NGS/API/5-4-2016/*.bam ; do
 echo "Start bam validation creation: $(date) - File: $f"
 bname=`basename $f`
 pref=${bname%%.bam}
 bam validate --in $f --verbose 2> /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/${pref}_validation.txt
 echo "End bam validation creation: $(date) - File: $f"
done >> "$logfile"
for file in /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/*.txt ; do
 echo "Start verifying $(date) - File: $file"
 bname=`basename $file`
 if $(grep -iq "(SUCCESS)" "${file}"); then
    echo "The verification of the bam file has completed sucessfully."
else
    echo "The bam file is corrupted and has been removed, please check log for reason."
             [[ -f "$f" ]] && rm -f "$f"
    echo "End of bam file verification: $(date) - File: ${file}"
fi
done >> "$logfile"

process.log

 Start bam validation creation: Fri May  6 13:20:48 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA12878.bam
 End bam validation creation: Fri May  6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA12878.bam
 Start bam validation creation: Fri May  6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA19240.bam
 End bam validation creation: Fri May  6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA19240.bam
 Start bam validation creation: Fri May  6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NS12911.bam
 End bam validation creation: Fri May  6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NS12911.bam
 Start verifying Fri May  6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA12878_validation.txt
 The verification of the bam file has completed successfully.
 End of bam file verification: Fri May  6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA12878_validation.txt
 Start verifying Fri May  6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA19240_validation.txt
 The bam file is corrupted and has been removed, please check log for reason.
 End of bam file verification: Fri May  6 13:28:05 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA19240_validation.txt
 Start verifying Fri May  6 13:28:05 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NS12911_validation.txt
 The verification of the bam file has completed successfully.
 End of bam file verification: Fri May  6 13:28:05 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NS12911_validation.txt

1 个答案:

答案 0 :(得分:1)

我完全复制您的环境对我来说有点困难,因此我不得不对您的设置做出一些假设,以及您的约束条件。我看到很多方法可以简化流程或提高效率,但不是引入许多不必要的更改,我主要专注于使脚本工​​作。

话虽如此,我确实将处理重新安排到创建后立即验证每个${pref}_validation.txt的位置。

您可以尝试以下方法吗(注意:更新了脚本。第一次我走得太快并复制了错误的版本。)并让我知道结果是什么:

#!/bin/bash

logfile="/home/cmccabe/Desktop/NGS/API/5-4-2016/process.log"

for f in /home/cmccabe/Desktop/NGS/API/5-4-2016/*.bam ; do
    echo "Start bam validation creation: $(date) - File: $f"
    bname="$(basename "$f")"
    pref="${bname%%.bam}"
    bam validate --in "$f" --verbose 2> "/home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/${pref}_validation.txt"
    echo "End bam validation creation: $(date) - File: $f"

    file="/home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/${pref}_validation.txt"

    echo "Start verifying $(date) - File: $file"

    if grep -iq "(SUCCESS)" "${file}"; then
        echo "The verification of the bam file has completed sucessfully."
    else
        if [[ -f "$f" ]]; then
            rm -f "$f"
            echo "The bam file is corrupted and has been removed, please check log for reason."
        fi
    fi

    echo "End of bam file verification: $(date) - File: ${file}"

done >> "$logfile"

希望将一个for循环中的两个步骤组合在一起并不会偏离您的某些流程要求。我发现有用的是它允许更简化的代码流,现在日志文件应该如下所示:

Start bam validation creation: Fri May  6 13:20:48 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA12878.bam
End bam validation creation: Fri May  6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA12878.bam
Start verifying Fri May  6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA12878_validation.txt
The verification of the bam file has completed successfully.
End of bam file verification: Fri May  6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA12878_validation.txt
Start bam validation creation: Fri May  6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA19240.bam
...

高度修改版
我尝试了高度简化且更具弹性的脚本版本。如果你能检查这个,我会很感兴趣:

#!/bin/bash

# basepath allows you to quickly move the script by updating this path
basepath="/home/cmccabe/Desktop/NGS/API/5-4-2016"

# give the logfile a name
logfile="${basepath}/process.log"

# for each .bam file in basepath do
for f in ${basepath}/*.bam ; do

    # validate the file with the bam command
    # capture the stdout, stderr and return code via some crazy bash fu
    eval "$({ cmd_err=$({ cmd_out=$( \
        bam validate --in "$f" --verbose \
      ); cmd_rtn=$?; } 2>&1; declare -p cmd_out cmd_rtn >&2); declare -p cmd_err; } 2>&1)"

    # check the return code for positive completion
    if [ "${cmd_ret}" -eq "0" ]; then
        printf -- "%s - bam validation completed for: %s\n" "$(date)" "${f}"

        # check for string "(SUCCESS)" in bam command standard output 
        if grep -iq "(SUCCESS)" <<< "${cmd_out}"; then
            printf -- "%s - Verification of the bam file has completed sucessfully.\n" "$(date)"
        else
            # verify the bam file exists and can be deleted
            if [[ -f "$f" ]] && rm -f "$f" ; then
                printf -- "%s - The bam file is corrupted and has been removed, please check log for reason.\n" "$(date)"
            else
                printf -- "%s - WARNING: The bam file is corrupted but the file could not be deleted.\n" "$(date)"
            fi
        fi
    else
        # The bam validate command above did not complete with a
        # satisfactory result. This should not really ever happen unless
        # the bam command does not exist or some serious error occurred
        # when executing the bam command.
        # Consider addition actions in addition to logging the outcome
        printf -- "%s - WARNING: bam validation failed for file: %s - [%s]\n" "$(date)" "${f}" "${cmd_err}"
    fi

done >> "$logfile"