我在awk时非常新,一直在试图让这个工作起来。我正在尝试在“image.list”中获取文件列表并从中创建“info”文件。我需要从文件名中间抓取匹配正则表达式(数字长8-11位)的字符串,并将该匹配打印到我的“信息文件”中的指定位置。最后一部分是我无法脱身的部分。愿意帮助解决这个问题。
这是我的测试文件列表:
SURGERY0001275678image1.jpg
SURGERY11134900211image2.jpg
SURGERY19257012image3.jpg
SURGERY273142590image4.jpg
这是我目前的代码:
awk 'BEGIN {print "-----TEST TAG FILE\tENCOUNTERS-----";}
> {print "FILE: /tmp/imagetest/"$1,"\t","ENCOUNTER: ",($1~/^[0-9]{8,11}$/);}
> END{print "END REPORT";
> }' image.list > upload.tag
这是我目前的输出:
-----TEST TAG FILE ENCOUNTERS-----
FILE: /tmp/imagetest/SURGERY0001275678image1.jpg ENCOUNTER: 0
FILE: /tmp/imagetest/SURGERY11134900211image2.jpg ENCOUNTER: 0
FILE: /tmp/imagetest/SURGERY19257012image3.jpg ENCOUNTER: 0
FILE: /tmp/imagetest/SURGERY273142590image4.jpg ENCOUNTER: 0
END REPORT
我需要它显示的是“ENCOUNTER:”之后文件名中间的8-11位数字。到目前为止,我尝试的所有内容都输出整个文件名或“0”。
我可能偏离正轨,所以我很乐意从专家那里得到一些帮助!
答案 0 :(得分:5)
重新使用现有代码:
$ awk '
BEGIN {
print "-----TEST TAG FILE\tENCOUNTERS-----";
}
match($0,/[^0-9]+([0-9]+)[^0-9]+/,ary) {
print "FILE: /tmp/imagetest/"$1,"\t","ENCOUNTER:"ary[1]
}
END {
print "END REPORT";
}' testfile
$ cat testfile
SURGERY0001275678image1.jpg
SURGERY11134900211image2.jpg
SURGERY19257012image3.jpg
SURGERY273142590image4.jpg
$ awk '
> BEGIN {
> print "-----TEST TAG FILE\tENCOUNTERS-----";
> }
> match($0,/[^0-9]+([0-9]+)[^0-9]+/,ary) {
> print "FILE: /tmp/imagetest/"$1,"\t","ENCOUNTER:"ary[1]
> }
> END {
> print "END REPORT";
> }' testfile
-----TEST TAG FILE ENCOUNTERS-----
FILE: /tmp/imagetest/SURGERY0001275678image1.jpg ENCOUNTER:0001275678
FILE: /tmp/imagetest/SURGERY11134900211image2.jpg ENCOUNTER:11134900211
FILE: /tmp/imagetest/SURGERY19257012image3.jpg ENCOUNTER:19257012
FILE: /tmp/imagetest/SURGERY273142590image4.jpg ENCOUNTER:273142590
END REPORT
正如Ed Morton在评论中建议的那样,使用数组参数来匹配()这个解决方案只是GNU awk。
答案 1 :(得分:3)
sed -r -e 's#(.*)#FILE:\t/tmp/imagetest/\1#;s/([0-9]*)(i[^i]*)$/\1\2\tENCOUNTER:\1/;1i -----TEST TAG FILE ENCOUNTERS-----' -e '$aEND REPORT' file
-----TEST TAG FILE ENCOUNTERS----- FILE: /tmp/imagetest/SURGERY0001275678image1.jpg ENCOUNTER:0001275678 FILE: /tmp/imagetest/SURGERY11134900211image2.jpg ENCOUNTER:11134900211 FILE: /tmp/imagetest/SURGERY19257012image3.jpg ENCOUNTER:19257012 FILE: /tmp/imagetest/SURGERY273142590image4.jpg ENCOUNTER:273142590 END REPORT
答案 2 :(得分:2)
这是常用的awk函数“extract()”,用于提取与RE匹配的字符串:
awk -v re='<whatever>' '
function extract(str,regexp)
{ RMATCH = (match(str,regexp) ? substr(str,RSTART,RLENGTH) : "")
return RSTART
}
extract($0,re) { print RMATCH }
'
只需将“re”设置为您想要匹配的内容,例如:
$ cat file
SURGERY0001275678image1.jpg
SURGERY11134900211image2.jpg
SURGERY19257012image3.jpg
SURGERY273142590image4.jpg
$ awk -v re='[[:digit:]]{8,11}' '
function extract(str,regexp)
{ RMATCH = (match(str,regexp) ? substr(str,RSTART,RLENGTH) : "")
return RSTART
}
extract($0,re) { print RMATCH }
' file
0001275678
11134900211
19257012
273142590
或者如果您更喜欢使用相同匹配()+ substr()方法的更具体的解决方案:
$ awk '
BEGIN{ print "-----TEST TAG FILE\tENCOUNTERS-----" }
{ printf "FILE: %s\tENCOUNTER: %d\n", $0, (match($0,/[[:digit:]]{8,11}/) ? substr($0,RSTART,RLENGTH) : 0) }
END{ print "END REPORT" }
' file
-----TEST TAG FILE ENCOUNTERS-----
FILE: SURGERY0001275678image1.jpg ENCOUNTER: 1275678
FILE: SURGERY11134900211image2.jpg ENCOUNTER: 11134900211
FILE: SURGERY19257012image3.jpg ENCOUNTER: 19257012
FILE: SURGERY273142590image4.jpg ENCOUNTER: 273142590
END REPORT
请注意,如果所有文件名都遵循相同的模式,并且在您关注的8-11位流之前没有其他数字,则可以使用[[:digit:]]+
作为匹配的RE,而不是显式如果您愿意,请指定范围[[:digit:]]{8,11}
。
答案 3 :(得分:0)
试试这个:
$ cat input
SURGERY0001275678image1.jpg
SURGERY11134900211image2.jpg
SURGERY19257012image3.jpg
SURGERY273142590image4.jpg
$ awk '{split($1,a,/[[:alpha:]]*/);print a[2]}' input
0001275678
11134900211
19257012
273142590
答案 4 :(得分:0)
尝试以下方法:
awk 'BEGIN {print "-----TEST TAG FILE\tENCOUNTERS-----";}
{print "FILE: /tmp/imagetest/"$1,"\t","ENCOUNTER: ",gensub(/[^0-9]*([0-9]*).*/, "\\1", 1, $1);}
END{print "END REPORT";
}' image.list > upload.tag
答案 5 :(得分:0)
awk '{encounter=$1; sub("^[^0-9]*([0-9]{8,11}).*", "\\1", encounter);
print "FILE: /tmp/imagetest/"$1,"\t","ENCOUNTER: ",encounter;}'
答案 6 :(得分:0)
此
awk 'BEGIN {print "-----TEST TAG FILE\tENCOUNTERS-----";}
{printf "FILE: /tmp/imagetest/"$1"\tENCOUNTER: ";if($1~/[0-9]{8,11}/){sub(/
[0-9]+\.jpg$/,"",$1); gsub(/[a-zA-Z]/,"",$1);print $1}}
END{print "END REPORT";
}' image.list
将打印
-----TEST TAG FILE ENCOUNTERS-----
FILE: /tmp/imagetest/SURGERY0001275678image1.jpg ENCOUNTER: 0001275678
FILE: /tmp/imagetest/SURGERY11134900211image2.jpg ENCOUNTER: 11134900211
FILE: /tmp/imagetest/SURGERY19257012image3.jpg ENCOUNTER: 19257012
FILE: /tmp/imagetest/SURGERY273142590image4.jpg ENCOUNTER: 273142590
END REPORT