使用下面的tab-delimited
file
我尝试验证标题行1,然后将该数字存储在变量$header
中,以便在几个if
语句中使用。如果$header
等于10,则为file has expected number of fields
,但如果$header
小于10 file is missing header for:
,则会在下方打印缺少的标题字段。 bash
似乎很接近,如果我单独使用awk
它似乎完美无缺,但我似乎无法在if
中使用它。谢谢你:)。
file.txt的
Index Chr Start End Ref Alt Freq Qual Score Input
1 1 1 100 C - 1 GOOD 10 .
2 2 20 200 A C .002 STRAND BIAS 2 .
3 2 270 400 - GG .036 GOOD 6 .
FILE2.TXT
Index Chr Start End Ref Alt Freq Qual Score
1 1 1 100 C - 1 GOOD 10
2 2 20 200 A C .002 STRAND BIAS 2
3 2 270 400 - GG .036 GOOD 6
的bash
for f in /home/cmccabe/Desktop/validate/*.txt; do
bname=`basename $f`
pref=${bname%%.txt}
header=$(awk -F'\t' '{print NF, "fields detected in file and they are:" ORS $0; exit}') $f >> ${pref}_output # detect header row in file and store in header and write to output
if [[ $header == "10" ]]; then # display results
echo "file has expected number of fields" # file is validated for headers
else
echo "file is missing header for:" # missing header field ...in file not-validated
echo "$header"
fi # close if.... else
done >> ${pref}_output
file.txt
file has expected number of fields
file1.txt
file is missing header for:
Input
答案 0 :(得分:2)
如果您愿意,可以使用awk
,但bash
能够自行处理第一行字段比较。如果维护一个预期字段名称数组,则可以轻松地将第一行拆分为字段,与预期的字段数进行比较,如果读取的字段数少于预期的任何给定字段数,则输出丢失字段的标识文件。
以下是将文件名作为参数的简短示例(您需要从stdin
获取大量文件的文件名,或根据需要使用xargs
。该脚本只读取每个文件中的第一行,将行分隔为字段,检查字段计数,并在短消息中输出任何缺少的字段:
#!/bin/bash
declare -i header=10 ## header has 10 fields
## aray of field names (can be read from 1st file)
fields=( "Index"
"Chr"
"Start"
"End"
"Ref"
"Alt"
"Freq"
"Qual"
"Score"
"Input" )
for i in "$@"; do ## for each file given as argument
read -r line < "$i" ## read first line from file into 'line'
oldIFS="$IFS" ## save current Internal Field Separator (IFS)
IFS=$'\t' ## set IFS to word-split on '\t'
fldarray=( $line ); ## fill 'fldarray' with fields in line
IFS="$oldIFS" ## restore original IFS
nfields=${#fldarray[@]} ## get number of fields in 'line'
if (( nfields < header )) ## test against header
then
printf "error: only '%d' fields in file '%s'\nmissing:" "$nfields" "$i"
for j in "${fields[@]}" ## for each expected field
do ## check against those in line, if not present print
[[ $line =~ $j ]] || printf " %s" "$j"
done
printf "\n\n" ## tidy up with newlines
fi
done
示例输入
$ cat dat/hdr.txt
Index Chr Start End Ref Alt Freq Qual Score Input
1 1 1 100 C - 1 GOOD 10 .
2 2 20 200 A C .002 STRAND BIAS 2 .
3 2 270 400 - GG .036 GOOD 6 .
$ cat dat/hdr2.txt
Index Chr Start End Ref Alt Freq Qual Score
1 1 1 100 C - 1 GOOD 10
2 2 20 200 A C .002 STRAND BIAS 2
3 2 270 400 - GG .036 GOOD 6
$ cat dat/hdr3.txt
Index Chr Start End Alt Freq Qual Score Input
1 1 1 100 - 1 GOOD 10 .
2 2 20 200 C .002 STRAND BIAS 2 .
3 2 270 400 GG .036 GOOD 6 .
示例使用/输出
$ bash hdrfields.sh dat/hdr.txt dat/hdr2.txt dat/hdr3.txt
error: only '9' fields in file 'dat/hdr2.txt'
missing: Input
error: only '9' fields in file 'dat/hdr3.txt'
missing: Ref
仔细研究一下,虽然awk
可以做很多事情bash
不能独立完成,但bash能够解析文本。
答案 1 :(得分:1)
这段代码将完全按照您的要求行事。请让我知道这对你有没有用。
for f in ./*.txt; do
[[ $( head -1 $f | awk '{ print NF}' ) -eq 10 ]] && echo "File $f has all the fields on its header" || echo "File $f is missing " $( echo "Index Chr Start End Ref Alt Freq Qual Score Input $( head -1 $f )" | tr ' ' '\n' | sort | uniq -c | awk '/1 / {print $2}' );
done
输出:
File ./file2.txt is missing Input
File ./file.txt has all the fields on its header
答案 2 :(得分:1)
这是GNU awk(nextfile
)中的一个:
$ awk '
FNR==NR {
for(n=1;n<=NF;n++)
a[$n]
nextfile
}
NF==(n-1) {
print FILENAME " file has expected number of fields"
nextfile
}
{
for(i=1;i<=NF;i++)
b[$i]
print FILENAME " is missing header for: "
for(i in a)
if(i in b==0)
print i
nextfile
}' file1 file1 file2
file1 file has expected number of fields
file2 is missing header for:
Input
脚本处理的第一个文件定义了以下文件应具有的标头(在a
中),并将它们(在b
中)与它进行比较。