对,我有这个代码
for line in npp_test_file.csv
awk -F, '
BEGIN { FPAT = "([^,]+)|(\"[^\"]+\")" }
NF!=17 { print "incorrect amount of fields"; exit }
!($1~/^("[A-Z0-9]{1,25}")$/) {print "1st field invalid";}
!($2~/("[[:digit:]]{1,3}")$/) {print "2nd field invalid";}
!($3~/^("[A-Z0-9]{1,8}")$/) {print "3rd field invalid";}
!($4~/^("[A-Z0-9]{0,1}")$/) {print "4th field invalid";}
!($5~/^("[A-Z0-9]{0,11}")$/) {print "5th field invalid";}
!($6~/^("")$/) {print "6th field invalid";}
!($7~/^("[0-9]{4}[-/][0-9]{2}[-/][0-9]{2}")$/B) {print "7th field invalid";}
!($8~/^("[1-5]{1}")$/) {print "8th field invalid";}
!($9~/^("[0-9]{4}[-/][0-9]{2}[-/][0-9]{2}")$/) {print "9th field invalid";}
!($10~/^("[0-9]{4}[-/][0-9]{2}[-/][0-9]{2}")$/) {print "10th field invalid";}
!($11~/^("([01]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]")|""$/) {print "11th field invalid";}
!($12~/^("([01]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]")|""$/) {print "12th field invalid";}
!($13~/^("[A-Za-z0-9]{0,70}")|""$/) {print "13th field invalid";}
!($14~/^("[A-Za-z0-9]{1}")|""$/) {print "14th field invalid";}
!($15~/^("[0-9]{0,3}")$/) {print "15th field invalid";}
!($16~/^(".+")$/) {print "16th field invalid";}
!($17~/^(".+")|""$/) {print "17th field invalid";}
{print "you have 17 fields";
exit}' $line
done
现在此代码旨在获取npp_test_file.csv中保存的数据,然后将其拆分为17个字段,然后将这些字段中的每个字段分配给一个变量,以便随后可以测试每个字段是否符合一组给定条件。 / p>
但是,由于文件具有一组以上的17个字段,因此一个文件中的文件可能多达100个左右。尽管我的代码for line
可以工作,但是没有用,我需要一种方法让程序在每一行中循环返回。
csv文件中数据的示例
"AAA0002","112","BA001000","","HG55USW","","2018-06-21","1","2018-06-21","2018-06-21","11:26:30","11:26:30","colchester","2","003","some form of string",""
"ABC0004","a009","BAV01000","A","HG43FHG","","2018-06-21","1","2018-06-21","2018-06-21","11:26:30","11:26:30","bridgend","1","112","a second form of string ",""
"aADF0005","s012","BA0Q1000","1","CV63LTG","","2018-06-21","1","2018-06-21","2018-06-21","11:26:30","11:26:30","london","1","112","another form of string","none"
这应该打印到屏幕上“字段1无效而字段2无效”
答案 0 :(得分:2)
一个更通用的版本,您可以将所有模式检查都放入一个数组中,以更轻松地适应其他字段计数,我使用Fld var自动递增,但可以根据需要放置直接索引):
awk -F ',' '
BEGIN{
FPAT = "([^,]+)|(\"[^\"]+\")"
Fld = 0
Pat[++Fld]="^(\"[A-Z0-9]{1,25}\")$"
Pat[++Fld]="(\"[[:digit:]]{1,3}\")$"
Pat[++Fld]="^(\"[A-Z0-9]{1,8}\")$"
Pat[++Fld]="^(\"[A-Z0-9]{0,1}\")$"
Pat[++Fld]="^(\"[A-Z0-9]{0,11}\")$"
Pat[++Fld]="^(\"\")$"
Pat[++Fld]="^(\"[0-9]{4}[-/][0-9]{2}[-/][0-9]{2}\")$"
Pat[++Fld]="^(\"[1-5]{1}\")$"
Pat[++Fld]="^(\"[0-9]{4}[-/][0-9]{2}[-/][0-9]{2}\")$"
Pat[++Fld]="^(\"[0-9]{4}[-/][0-9]{2}[-/][0-9]{2}\")$"
Pat[++Fld]="^(\"([01]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]\")|\"\"$"
Pat[++Fld]="^(\"([01]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]\")|\"\"$"
Pat[++Fld]="^(\"[A-Za-z0-9]{0,70}\")|\"\"$"
Pat[++Fld]="^(\"[A-Za-z0-9]{1}\")|\"\"$"
Pat[++Fld]="^(\"[0-9]{0,3}\")$"
Pat[++Fld]="^(\".+\")$"
Pat[++Fld]="^(\".+\")|\"\"$"
}
NF != 17 {
printf( "Line %3d : incorrect amount of fields\n", NR )
next
}
{
for (Idx=1; Idx<=Fld; Idx++ ) {
if ( $Idx !~ Pat[Idx] ) {
printf( "Line %3d : %2dth field is invalid\n", NR, Idx )
}
}
}
{ printf( "Line %3d : you have 17 fields\n", NR ) }
' npp_test_file.csv
答案 1 :(得分:1)
如果您从最后一个街区中删除了exit
(并且在您将其移至-F,
时,则不需要它,并且FPAT
):
...
{print "you have 17 fields";
exit}
成为
...
{
print "you have 17 fields"
}
输出将为
you have 17 fields
2nd field invalid
you have 17 fields
1st field invalid
2nd field invalid
you have 17 fields
这是您在字段1无效而字段2无效”中寻找的东西吗??
答案 2 :(得分:0)
请您尝试以下操作(由于不存在示例,因此未对其进行测试)。
awk '
BEGIN { FPAT = "([^,]+)|(\"[^\"]+\")" }
NF!=17 { print "incorrect amount of fields"; next}
!($1~/^("[A-Z0-9]{1,25}")$/) {print "1st field invalid";}
!($2~/("[[:digit:]]{1,3}")$/) {print "2nd field invalid";}
!($3~/^("[A-Z0-9]{1,8}")$/) {print "3rd field invalid";}
!($4~/^("[A-Z0-9]{0,1}")$/) {print "4th field invalid";}
!($5~/^("[A-Z0-9]{0,11}")$/) {print "5th field invalid";}
!($6~/^("")$/) {print "6th field invalid";}
!($7~/^("[0-9]{4}[-/][0-9]{2}[-/][0-9]{2}")$/B) {print "7th field invalid";}
!($8~/^("[1-5]{1}")$/) {print "8th field invalid";}
!($9~/^("[0-9]{4}[-/][0-9]{2}[-/][0-9]{2}")$/) {print "9th field invalid";}
!($10~/^("[0-9]{4}[-/][0-9]{2}[-/][0-9]{2}")$/) {print "10th field invalid";}
!($11~/^("([01]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]")|""$/) {print "11th field invalid";}
!($12~/^("([01]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]")|""$/) {print "12th field invalid";}
!($13~/^("[A-Za-z0-9]{0,70}")|""$/) {print "13th field invalid";}
!($14~/^("[A-Za-z0-9]{1}")|""$/) {print "14th field invalid";}
!($15~/^("[0-9]{0,3}")$/) {print "15th field invalid";}
!($16~/^(".+")$/) {print "16th field invalid";}
!($17~/^(".+")|""$/) {print "17th field invalid";}
{print "you have 17 fields"}' npp_test_file.csv
以下是OP尝试进行的更正:
1-删除了for
循环,因为awk
可以读取Input_file本身,而且OP看起来就像只读取1个Input_file。
2-从exit
检查条件中删除了NF
,否则不会检查所有条件。
3-当您提供-F
时从代码中删除了FPAT
,则无需恕我直言。
4-当也找到17个字段时,从exit
命令中删除了第二个print
命令。
编辑: :如果您想在任何字段无效(甚至是单个字段)时移至下一行,则可以在条件中添加next
例如。
awk '
BEGIN { FPAT = "([^,]+)|(\"[^\"]+\")" }
NF!=17 { print "incorrect amount of fields"; next}
!($1~/^("[A-Z0-9]{1,25}")$/) {print "1st field invalid"; next}
!($2~/("[[:digit:]]{1,3}")$/) {print "2nd field invalid"; next}
!($3~/^("[A-Z0-9]{1,8}")$/) {print "3rd field invalid"; next}
!($4~/^("[A-Z0-9]{0,1}")$/) {print "4th field invalid"; next}
!($5~/^("[A-Z0-9]{0,11}")$/) {print "5th field invalid"; next}
!($6~/^("")$/) {print "6th field invalid"; next}
!($7~/^("[0-9]{4}[-/][0-9]{2}[-/][0-9]{2}")$/B) {print "7th field invalid"; next}
!($8~/^("[1-5]{1}")$/) {print "8th field invalid"; next}
!($9~/^("[0-9]{4}[-/][0-9]{2}[-/][0-9]{2}")$/) {print "9th field invalid"; next}
!($10~/^("[0-9]{4}[-/][0-9]{2}[-/][0-9]{2}")$/) {print "10th field invalid"; next}
!($11~/^("([01]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]")|""$/) {print "11th field invalid"; next}
!($12~/^("([01]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]")|""$/) {print "12th field invalid"; next}
!($13~/^("[A-Za-z0-9]{0,70}")|""$/) {print "13th field invalid"; next}
!($14~/^("[A-Za-z0-9]{1}")|""$/) {print "14th field invalid"; next}
!($15~/^("[0-9]{0,3}")$/) {print "15th field invalid"; next}
!($16~/^(".+")$/) {print "16th field invalid"; next}
!($17~/^(".+")|""$/) {print "17th field invalid"; next}
{print "you have 17 fields"}' npp_test_file.csv
EDIT2: 此处来自@NeronLeVelu的解决方案。修改他的解决方案,以使用REGEX创建自动数组(将用于不同字段的检查)。尽管由于之前也没有适当的样本,所以我没有对其进行测试。
awk '
BEGIN{
FPAT = "([^,]+)|(\"[^\"]+\")"
Fld = 0
num=split("^(\"[A-Z0-9]{1,25}\")$,(\"[[:digit:]]{1,3}\")$,^(\"[A-Z0-9]{1,8}\")$,^(\"[A-Z0-9]{0,1}\")$,^(\"[A-Z0-9]{0,11}\")$,^(\"\")$,^(\"[0-9]{4}[-/][0-9]{2}[-/][0-9]{2}\")$,^(\"[1-5]{1}\")$,^(\"[0-9]{4}[-/][0-9]{2}[-/][0-9]{2}\")$,^(\"[0-9]{4}[-/][0-9]{2}[-/][0-9]{2}\")$,^(\"([01]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]\")|\"\"$,^(\"([01]?[0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]\")|\"\"$,^(\"[A-Za-z0-9]{0,70}\")|\"\"$,^(\"[A-Za-z0-9]{1}\")|\"\"$,^(\"[0-9]{0,3}\")$,^(\".+\")$,^(\".+\")|\"\"$", array",")
for(i=1;i<=num;i++){
regex[i]=array[i]
}
NF != 17 {
printf( "Line %3d : incorrect amount of fields\n", NR )
next
}
{
for (Idx=1; Idx<=num; Idx++ ) {
if ( $Idx !~ regex[Idx] ) {
printf( "Line %3d : %2dth field is invalid\n", NR, Idx )
}
}
}' Input_file