我必须找出生产中的工作是否有备份工作。作业区域用后缀表示,PS表示生产,PP表示备份。此外,我需要确保不仅名称相同(最后两个字符除外),而且它们引用的脚本也是相同的。
我使用了一个双循环。我回显了内容和所有数据线,捕获的greps,回显到while循环。脚本数据是好的,直到我到达if语句,在那里我推断脚本名称,然后将它们相互比较。当我运行这些工作时,我可以看到哪些工作没有排成一行,但是,我需要这些if语句来为我工作。 Autosys中有超过24,000个工作岗位,生产和备份之间的分配很小,但即使是轻微的也是相当可观的。手动检查电子表格太多了。
#!/bin/bash
IFS=,
file="/tmp/casper_test.txt"
while read -r area job machine script
do
prod_line=$(grep ${job%??} $file)
echo "$prod_line" | while IFS=, read -r area job machine script
do
if [ "$area" == "PROD" ] ; then
prod_script="$script"
elif [ "$area" == "BACKUP" ] ; then
backup_script="$script"
elif [ "$prod_script" == "$backup_script" ] ; then
echo "MATCH,$area,$job,$machine,$script "
else
echo "NO MATCH,$area,$job, $machine, $script "
fi
done
done < $file
输入文件/tmp/casper_test.txt
:
BACKUP, CAPSER_JOB_01_PP, usa-penguin.com, /bin/bash -lc '/usr/bin/run.sh'
PROD, CAPSER_JOB_01_PS, usa-penguin.com, /bin/bash -lc '/usr/bin/run.sh'
BACKUP, CAPSER_JOB_02_PP, usa-penguin.com, /bin/bash -lc '$HOME/run/script02'
PROD, CAPSER_JOB_02_PS, usa-penguin.com, /bin/bash -lc '$HOME/run/comeAndPlay'
BACKUP, CAPSER_03_PP, usa-penguin.com, /bin/bash -lc '$HOME/run/script03'
PROD, CAPSER_JOB_03_PS, usa-penguin.com, /bin/bash -lc '$HOME/run/script03'
BACKUP, CAPSER_JOB_04_PP, usa-penguin.com, /bin/bash -lc '$HOME/run/script04'
PROD, CAPSER_JOB_04_PS, usa-penguin.com, /bin/bash -lc '$HOME/run/withUsDanny'
PROD, CAPSER_JOB_05_PS, usa-penguin.com, /bin/bash -lc '$HOME/run/script05'
PROD, CAPSER_JOB_06_PS, usa-penguin.com, /bin/bash -lc '$HOME/run/script06'
BACKUP, CAPSER_JOB_07_PP, usa-penguin.com, /bin/bash -lc '$HOME/run/script07'
PROD, CAPSER_JOB_07_PS, usa-penguin.com, /bin/bash -lc '$HOME/run/script07'
答案 0 :(得分:2)
由于您真正需要的是没有匹配备份作业的生产作业名称列表,因此这里列出了一个awk脚本:
awk -F ', *' '{gsub("_..$", "", $2)} /BACKUP/{b[$2] = $NF} /PROD/{p[$2] = $NF} END {for (i in p) if (p[i] != b[i]) print i}'
-F ', *'
- 用逗号分隔后跟空格{gsub("_..$", "", $2)}
从作业名称中移除后缀,即第二个字段/BACKUP/{b[$2] = $NF} /PROD/{p[$2]=$NF}
将备份脚本保存在一个阵列中,将prod脚本保存在另一个阵列中END {for (i in p) if (p[i] != b[i]) print i}
- 读完所有行后,循环浏览prod脚本并在备份中打印没有匹配脚本示例输出:
CAPSER_JOB_02
CAPSER_JOB_03
CAPSER_JOB_04
CAPSER_JOB_05
CAPSER_JOB_06
具有这些ID的作业都没有匹配,其余的匹配。
至于shell脚本,看一下内部while循环中会发生什么:
echo "$prod_line" | while IFS=, read -r area job machine script
do
if [ "$area" == "PROD" ] ; then
prod_script="$script"
elif [ "$area" == "BACKUP" ] ; then
backup_script="$script"
elif [ "$prod_script" == "$backup_script" ] ; then
echo "MATCH,$area,$job,$machine,$script "
else
echo "NO MATCH,$area,$job, $machine, $script "
fi
done
grep
输出中的行数不会超过两行,其中包含BACKUP
或PROD
。因此,您的第三个elif
和else
永远不会到达。那些应该可以移到内部循环之外,这样当你读完两个时就会发生测试。由于缺少某些备份作业,您可能希望在读取之前清除这些值,以便不重复使用旧值。
答案 1 :(得分:2)
您可以在纯Bash中使用哈希和输入文件中的单个读取来执行此操作。在输入文件中有24K行,这种方法比读取 n + 1 次文件的解决方案更有效,对于具有24K行的文件,这种方法是24001次!我也添加了一些基本的错误处理。
#!/bin/bash
line=0
declare -A prod_jobs_job prod_jobs_scripts prod_jobs_machines backup_jobs_scripts
while IFS=, read -r area job machine script; do
((line++))
j="${job%??}"
if [[ $area == "PROD" ]]; then
prod_jobs_job[$j]="$job" # this hash holds the original job name
prod_jobs_scripts[$j]="$script" # holds the prod script
prod_jobs_machines[$j]="$machine" # holds the prod machine, used for printing only
elif [[ $area == "BACKUP" ]]; then
backup_jobs_scripts[$j]="$script" # holds the backup script, used for comparison
else
printf '%s\n' "Unknown area '$area' at line number $line" >&2
fi
done < <(sed 's/, */,/g' t1) # make sure to strip out the spaces after commas
# traverse the prod jobs hash and compare with backup
# if there is no match in backup hash, treat it as an error
for j in "${!prod_jobs_scripts[@]}"; do
prod_script="${prod_jobs_scripts[$j]}"
job="${prod_jobs_job[$j]}"
backup_script="${backup_jobs_scripts[$j]}"
[[ ! $backup_script ]] && { printf '%s\n' "No backup job for '$job'" >&1; continue; }
prod_machine="${prod_jobs_machines[$j]}"
if [[ $prod_script == $backup_script ]]; then
printf '%s\n' "MATCH:PROD,$job,$prod_machine,$prod_script"
else
printf '%s\n' "NO MATCH:PROD,$job,$prod_machine,$prod_script"
fi
done
对于您的输入文件,我们得到此输出:
MATCH:PROD,CAPSER_JOB_07_PS,usa-penguin.com,/bin/bash -lc '$HOME/run/script07'
No backup job for 'CAPSER_JOB_06_PS'
MATCH:PROD,CAPSER_JOB_01_PS,usa-penguin.com,/bin/bash -lc '/usr/bin/run.sh'
NO MATCH:PROD,CAPSER_JOB_02_PS,usa-penguin.com,/bin/bash -lc '$HOME/run/comeAndPlay'
No backup job for 'CAPSER_JOB_03_PS'
NO MATCH:PROD,CAPSER_JOB_04_PS,usa-penguin.com,/bin/bash -lc '$HOME/run/withUsDanny'
No backup job for 'CAPSER_JOB_05_PS'
答案 2 :(得分:0)
尝试其他选择:
grep PROD /tmp/casper.txt > PROD.txt
grep BACKUP /tmp/casper.txt > BACKUP.txt
awk 'FNR==NR{a[$6];b[substr($2,0,13)];next}($6 in a && substr($2,0,13) in b){print}' BACKUP.txt PROD.txt
这将导致并且可以持续输入文件中的大量行....
PROD, CAPSER_JOB_01_PS, usa-penguin.com, /bin/bash -lc '/usr/bin/run.sh'
PROD, CAPSER_JOB_07_PS, usa-penguin.com, /bin/bash -lc '$HOME/run/script07'
对于较大的输入文件,以下代码是不可持续的。
您使while loop
过于复杂,并且通过对两个循环使用相同的变量名称而有点错误。看看以下内容是否适合您。
#!/bin/bash
IFS=,
file="casper.txt"
while read -r area job machine script
do
if [ "$area" == "PROD" ] ; then
prod_script="$script"
jobname=${job%??}
IFS=,
while read -r area1 job1 machine1 script1
do
if [ "$area1" == "BACKUP" ]; then
jobname1=${job1%??}
if [ "$jobname" == "$jobname1" ]; then
if [ "$prod_script" == "$script1" ] ; then
echo "MATCH: $area,$job,$machine,$script"
break;
fi
fi
fi
done < "$file"
fi
done < "$file"
这将来自您的输入文件
]# ./casper
MATCH: PROD, CAPSER_JOB_01_PS, usa-penguin.com, /bin/bash -lc '/usr/bin/run.sh'
MATCH: PROD, CAPSER_JOB_07_PS, usa-penguin.com, /bin/bash -lc '$HOME/run/script07'