我需要将输入格式分隔数据的格式从输入格式更改为输出格式,如下所述,请帮我写一个脚本。
输入文件:
BRANCH_CODE DEPT_CODE ITEM_CODE UNIT_CODE 01/04/2017 02/04/2017 03/04/2017 04/04/2017 05/04/2017 06/04/2017 07/04/2017 08/04/2017 09/04/2017 10/04/2017
KI-01 DP-0001 10001 KG 31.5 45 72 84 67.5 39 57 22.5 22 56
KI-01 DP-0001 10002 KG 22 0 62 18 49 13 75 17 0 72
输出格式:
DOC_DATE BRANCH_CODE DEPT_CODE ITEM_CODE UNIT_CODE QTY
01/04/2017 KI-01 DP-0001 10001 KG 31.5
01/04/2017 KI-01 DP-0001 10002 KG 22
02/04/2017 KI-01 DP-0001 10001 KG 45
02/04/2017 KI-01 DP-0001 10002 KG 0
03/04/2017 KI-01 DP-0001 10001 KG 72
03/04/2017 KI-01 DP-0001 10002 KG 62
等等
我在 .sh 文件中编写了这样的代码。
#!/bin/bash
awk 'NR!=1{print $0}' input.tsv > temp_data_wo_header.tsv;
lc=$(wc -l < temp_data_wo_header.tsv);
for ((i=6; i<=15; i++))
do
echo "Constructing date file "$i" and ...";
(for (( c=1; c<=$lc; c++));
do
awk 'NR==1{print $'$i'}' input.tsv;
done
) > temp_date.tsv;
echo "Adding date to data file...";
paste <(awk '{print $1}' temp_date.tsv ) <(awk 'BEGIN { FS = "\t" } ; {print $1,$2,$3,$5,$'$i'}' temp_data_wo_header.tsv ) > "temp_day_"$i"_data.tsv";
echo "Finished adding...";
done;
还有其他方法可以在更好的代码中完成。
答案 0 :(得分:1)
自从使用2D数组以来,这是GNU awk中的一个:
$ awk '
BEGIN {
FS=OFS="\t" } # set the delimiters
{
sub(/\r/,"",$NF) # in case of \r\n line endings
a[NR][1] # define array element
n=split($0,a[NR],FS) # split record to a[NR]
a[NR][4]=$1 OFS $2 OFS $3 OFS $4 # gather constants to one element
if(NR==1)
a[NR][4]="DOC_DATE" OFS a[NR][4] OFS "QTY"
}
END { # everything is in memory
print a[1][4]; # header print
for(j=5;j<=n;j++) # loop all data fields
for(i=2;i<=NR;i++) # loop all records
print a[1][j],a[i][4],a[i][j] # output
}' file
DOC_DATE BRANCH_CODE DEPT_CODE ITEM_CODE UNIT_CODE QTY
01/04/2017 KI-01 DP-0001 10001 KG 31.5
01/04/2017 KI-01 DP-0001 10002 KG 22
02/04/2017 KI-01 DP-0001 10001 KG 45
02/04/2017 KI-01 DP-0001 10002 KG 0
03/04/2017 KI-01 DP-0001 10001 KG 72
答案 1 :(得分:1)
在awk中进行整个处理会更好。例如:
BEGIN {
FS = "\t"
OFS = "\t"
}
NR == 1 {
header = $0
for (i = 5; i <= NF; i++) days[i - 5] = $i
}
NR > 1 {
for (i = 5; i <= NF; i++) {
keys[NR, i] = $1 OFS $2 OFS $3 OFS $4
data[NR, i] = $i
}
}
END {
$0 = header
print "DOC_DATE", $1, $2, $3, $4, "QTY"
for (i = 5; i <= NF; i++) {
for (j = 2; j <= NR; j++) {
print $i, keys[j, i], data[j, i]
}
}
}
您可以将其保存在脚本中,例如script.awk
,然后您可以使用以下命令运行它:
awk -f script.awk input.tsv