以下bash
即将完成。我正在努力的唯一部分是process.log
如果找到字符串The bam file is corrupted and has been removed, please check log for reason.
,那么.bam
中相应的$f
(bash
)就是除去。我补充说:
echo "The bam file is corrupted and has been removed, please check log for reason."
[[ -f "$f" ]] && rm -f "$f"
试图这样做,但看起来它正在删除最后的.bam
(在process.log NA19240.bam中(该文件中包含搜索字符串),但事实并非如此)。而不是process.log中的最后一个.bam
(NS12911
)(即使搜索字符串不存在)。我无法解决这个问题,需要一些专家帮助。我为冗长而道歉发布,只是想添加所有细节。谢谢:)。
的bash
logfile=/home/cmccabe/Desktop/NGS/API/5-4-2016/process.log
for f in /home/cmccabe/Desktop/NGS/API/5-4-2016/*.bam ; do
echo "Start bam validation creation: $(date) - File: $f"
bname=`basename $f`
pref=${bname%%.bam}
bam validate --in $f --verbose 2> /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/${pref}_validation.txt
echo "End bam validation creation: $(date) - File: $f"
done >> "$logfile"
for file in /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/*.txt ; do
echo "Start verifying $(date) - File: $file"
bname=`basename $file`
if $(grep -iq "(SUCCESS)" "${file}"); then
echo "The verification of the bam file has completed sucessfully."
else
echo "The bam file is corrupted and has been removed, please check log for reason."
[[ -f "$f" ]] && rm -f "$f"
echo "End of bam file verification: $(date) - File: ${file}"
fi
done >> "$logfile"
process.log
Start bam validation creation: Fri May 6 13:20:48 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA12878.bam
End bam validation creation: Fri May 6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA12878.bam
Start bam validation creation: Fri May 6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA19240.bam
End bam validation creation: Fri May 6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA19240.bam
Start bam validation creation: Fri May 6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NS12911.bam
End bam validation creation: Fri May 6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NS12911.bam
Start verifying Fri May 6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA12878_validation.txt
The verification of the bam file has completed successfully.
End of bam file verification: Fri May 6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA12878_validation.txt
Start verifying Fri May 6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA19240_validation.txt
The bam file is corrupted and has been removed, please check log for reason.
End of bam file verification: Fri May 6 13:28:05 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA19240_validation.txt
Start verifying Fri May 6 13:28:05 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NS12911_validation.txt
The verification of the bam file has completed successfully.
End of bam file verification: Fri May 6 13:28:05 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NS12911_validation.txt
答案 0 :(得分:1)
我完全复制您的环境对我来说有点困难,因此我不得不对您的设置做出一些假设,以及您的约束条件。我看到很多方法可以简化流程或提高效率,但不是引入许多不必要的更改,我主要专注于使脚本工作。
话虽如此,我确实将处理重新安排到创建后立即验证每个${pref}_validation.txt
的位置。
您可以尝试以下方法吗(注意:更新了脚本。第一次我走得太快并复制了错误的版本。)并让我知道结果是什么:
#!/bin/bash
logfile="/home/cmccabe/Desktop/NGS/API/5-4-2016/process.log"
for f in /home/cmccabe/Desktop/NGS/API/5-4-2016/*.bam ; do
echo "Start bam validation creation: $(date) - File: $f"
bname="$(basename "$f")"
pref="${bname%%.bam}"
bam validate --in "$f" --verbose 2> "/home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/${pref}_validation.txt"
echo "End bam validation creation: $(date) - File: $f"
file="/home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/${pref}_validation.txt"
echo "Start verifying $(date) - File: $file"
if grep -iq "(SUCCESS)" "${file}"; then
echo "The verification of the bam file has completed sucessfully."
else
if [[ -f "$f" ]]; then
rm -f "$f"
echo "The bam file is corrupted and has been removed, please check log for reason."
fi
fi
echo "End of bam file verification: $(date) - File: ${file}"
done >> "$logfile"
希望将一个for循环中的两个步骤组合在一起并不会偏离您的某些流程要求。我发现有用的是它允许更简化的代码流,现在日志文件应该如下所示:
Start bam validation creation: Fri May 6 13:20:48 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA12878.bam
End bam validation creation: Fri May 6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA12878.bam
Start verifying Fri May 6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA12878_validation.txt
The verification of the bam file has completed successfully.
End of bam file verification: Fri May 6 13:28:03 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/bam_validation/NA12878_validation.txt
Start bam validation creation: Fri May 6 13:24:15 CDT 2016 - File: /home/cmccabe/Desktop/NGS/API/5-4-2016/NA19240.bam
...
高度修改版
我尝试了高度简化且更具弹性的脚本版本。如果你能检查这个,我会很感兴趣:
#!/bin/bash
# basepath allows you to quickly move the script by updating this path
basepath="/home/cmccabe/Desktop/NGS/API/5-4-2016"
# give the logfile a name
logfile="${basepath}/process.log"
# for each .bam file in basepath do
for f in ${basepath}/*.bam ; do
# validate the file with the bam command
# capture the stdout, stderr and return code via some crazy bash fu
eval "$({ cmd_err=$({ cmd_out=$( \
bam validate --in "$f" --verbose \
); cmd_rtn=$?; } 2>&1; declare -p cmd_out cmd_rtn >&2); declare -p cmd_err; } 2>&1)"
# check the return code for positive completion
if [ "${cmd_ret}" -eq "0" ]; then
printf -- "%s - bam validation completed for: %s\n" "$(date)" "${f}"
# check for string "(SUCCESS)" in bam command standard output
if grep -iq "(SUCCESS)" <<< "${cmd_out}"; then
printf -- "%s - Verification of the bam file has completed sucessfully.\n" "$(date)"
else
# verify the bam file exists and can be deleted
if [[ -f "$f" ]] && rm -f "$f" ; then
printf -- "%s - The bam file is corrupted and has been removed, please check log for reason.\n" "$(date)"
else
printf -- "%s - WARNING: The bam file is corrupted but the file could not be deleted.\n" "$(date)"
fi
fi
else
# The bam validate command above did not complete with a
# satisfactory result. This should not really ever happen unless
# the bam command does not exist or some serious error occurred
# when executing the bam command.
# Consider addition actions in addition to logging the outcome
printf -- "%s - WARNING: bam validation failed for file: %s - [%s]\n" "$(date)" "${f}" "${cmd_err}"
fi
done >> "$logfile"