我正在破解一些AWK。我是初学者。我已经完成了以下问题的作业,但无法使其正常工作。
Start Date 12/3/17
End Date 12/30/17
Report Type Report1
Currency ZAR
Country Identifier MType Quantity Net Net Net Code Title Contrib I_Type M_Type Vendor Identifier Offline Indicator LSN
ZA 44057330 FMP 1 0.050666 0.050666 USYYYYYYYYYY ABC Tom 1 1 USYYYYYYYYYY 0 SUT
ZA 1267456726 SIMT 1 0.03 0.03 USXXXXXXXXXX DEF Frances 1 1 USXXXXXXXXXX 0 XYZ
Row Count 657
Storefront Name MType Quantity Net Net
ZA FMP 601 30.45
ZA IAP 13 0.68
ZA IMP 1035 69.36
ZA SIMP 54 1.4
ZA FMT 70 0.53
ZA IMT 92 1.68
ZA SIMT 6 0.18
(我在这里未转义特殊字符。)
"Filename" "Start Date" "End Date" "Currency" "Country" "Identifier" "MType" "Quantity" "Net" "NetNet" "Code" "Title" "Contrib" "I_Type" "M_Type" "Vendor Identifier" "Offline Indicator" "LSN"
"rawfile.txt" "12/3/17" "12/30/17" "ZAR" "ZA" "44057330" "FMP" "1" "0.050666" "0.050666" "USYYYYYYYYYY" "ABC" "Tom" "1" "1" "USYYYYYYYYYY" "0" "SUT"
"rawfile.txt" "12/3/17" "12/30/17" "ZAR" "ZA" "1267456726" "SIMT" "1" "0.03" "0.03" "USXXXXXXXXXX" "DEF" "Frances" "1" "1" "USXXXXXXXXXX" "0" "XYZ"
基本上,我只需要从第5行获取大部分标题,但是我需要的三个字段在1-4行中。另外,我不需要包含以“行数”开头的行及其后的数据。
gawk '
function basename(file) {
sub(".*/", "", file)
return file
}
/^Row Count/ {nextfile}
FNR == 1 { StartDate=$2; }
FNR == 2 { EndDate=$2; }
FNR == 4 { curr=$2; }
NR == 5 {$0 = "StartDate" OFS "EndDate" OFS "Filename" OFS "curr" OFS $0; print}
FNR > 5 {$0 = StartDate OFS EndDate OFS basename(FILENAME) OFS curr OFS $0; print}
' OFS='\t' path/to/sourcefiles/*.txt > path/to/outfile.txt
谢谢!
这些是每个文件中字段标题之前的行。内容从第4行开始:
Provider ,,,,,,,,,,,,
01/01/2018 - 01/31/2018,,,,,,,,,,,,
几乎可以使用。但是每个文件都包含1-3行: aw 函数basename(file){ sub(“。* /”,“”,文件) 返回文件 } 开始{FS = OFS =“,”} NR <3 { 如果(NR == 2){ hdr =“ Report_Period” OFS val = val $ 1 OFS } 下一个 } FNR> 3 { 打印“文件名”,hdr $ 0 下一个 } {print basename(FILENAME),val $ 0} 'OFS =“,” / path / to / input / files>〜/ path / to / output / file / file.csv
编辑结束
答案 0 :(得分:4)
您的示例输入格式不清楚,但这可能是您要查找的,或者可能做得比必要的多,或者完全是其他事情:
$ cat tst.awk
BEGIN { FS=OFS="\t" }
/^Row Count/ { nextfile }
FNR==1 {
fname = FILENAME
sub(/.*[/]/,"",fname)
}
{
gsub(/[\\]t/,FS)
gsub(/[\\][/]/,"/")
gsub(/[^\t]+/,"\"&\"")
}
FNR < 5 {
if ( FNR != 3 ) {
hdr = hdr $1 OFS
val = val $2 OFS
}
next
}
FNR==5 {
print "\"Filename\"", hdr $0
next
}
{ print "\""fname"\"", val $0 }
$ awk -f tst.awk file
"Filename" "Start Date" "End Date" "Currency" "Country" "Identifier" "MType" "Quantity" "Net" "Net Net" "Code" "Title" "Contrib" "I_Type" "M_Type" "Vendor Identifier" "Offline Indicator" "LSN"
"file" "12/3/17" "12/30/17" "ZAR" "ZA" "44057330" "FMP" "1" "0.050666" "0.050666" "USYYYYYYYYYY" "ABC" "Tom" "1" "1" "USYYYYYYYYYY" "0" "SUT"
"file" "12/3/17" "12/30/17" "ZAR" "ZA" "1267456726" "SIMT" "1" "0.03" "0.03" "USXXXXXXXXXX""DEF" "Frances" "1" "1" "USXXXXXXXXXX" "0" "XYZ"
上面将GNU awk用于您已经在使用的nextfile。