awk文本文件到csv

时间:2018-04-09 06:16:43

标签: bash csv awk sed

我有一个庞大的文本文件,我需要将其转换为CSV文件,以便将其导入MySQL数据库。

文本文件如下所示:

原始文本文件

VL;1;1001;Productname 1;Description 1;2;MTR;METER;217883;10000;20180402;1;010206;&10;PRODUCER1;;N;10000;;
VA;2;4044773815245;V;
VA;3;0036453;V;
VL;1;1002;Productname 2;This is product decrtiption for 2 product;2;MTR;METER;140365;10000;20180402;1;010206;&10;PRODUCER1;;N;10000;;
VX;WEIGHT;7500
VX;VOLUME;3249
VX;DIMENSJON;57x57x1000
VA;2;4044773452884;V;
VA;3;0036479;V;
VL;1;1003;Productname 3;Description......;2;MTR;METER;1575;10000;20171006;1;010606;&10;PRODUCER1;;N;10000;;
VX;PDF;1003.pdf
VX;IMAGE;1003.png
VX;BASEINFO;http://127.0.0.1/1003/
VX;WEIGHT;20
VX;DIMENSJON;0x7x0
VX;UNSPSC;26121616
VA;2;7070613017149;V;
VA;3;1000116;V;

通缉结果

我需要将其转换为CSV文件,如下所示:

type;   Productnumber;  Productname;    Description;        measurement_unit;   price_unit; price_unit_txt; price;  crowd;  price_date; status; block_number;   discount_group; manufac;    type;   stocked;    sales_package;  discount;   price_type; PDF;        IMAGE;      baseinfo;               WEIGHT; VOLUME; dimensjon;  UNSPSC;     va_2;           va_3;
1;      1001;           Productname 1;  Description 1;      2;                  MTR;        METER;          217883; 10000;  20180402    1;      010206;         &10;            PRODUCER1;  ;       N;          10000;          ;           ;           ;           ;           ;                       ;       ;       ;           ;           4044773815245;  0036453;
1;      1002;           Productname 2;  Description 2;      2;                  MTR;        METER;          140365; 10000;  20180402;   1;      010206;         &10;            PRODUCER2   ;       N;          10000;          ;           ;           ;           ;           ;                       7500;   3249;   57x57x1000; ;           4044773452884;  0036479;
1;      1003;           Productname 3;  Description ABC 3;  2;                  MTR;        METER;          1575;   10000;  20171006;   1;      010606;         &10;            PRODUCER3;  ;       N;          10000;          ;           ;           1003.pdf;   1003.png;   http://127.0.0.1/1003/; 20;     ;       0x7x0;      26121616;   7070613017149;  1000116;        

原始文件的说明

第一个产品系列始终以VL开头,然后按此顺序继续:

type;Productnumber;Productname;Description;measurement_unit;price_unit;price_unit_txt;price;crowd;price_date;status;block_number;discount_group;manufac;type;stocked;sales_package;discount;price_type;

PDF         is always on a new line starting with VX;PDF;
IMAGE       is always on a new line starting with VX;IMAGE;
baseinfo    is always on a new line starting with VX;BASEINFO;
WEIGHT      is always on a new line starting with VX;WEIGHT;
VOLUME      is always on a new line starting with VX;VOLUME;
dimensjon   is always on a new line starting with VX;DIMENSJON;
UNSPSC      is always on a new line starting with VX;UNSPSC;
va_2        is always on a new line starting with VA;2;
va_3        is always on a new line starting with VA;3;

希望有人可以帮助我解决这个问题:)

1 个答案:

答案 0 :(得分:1)

一种可能的方式(不是解决方案)

#!/bin/bash

awk -F';' '
    function init() {
            # formation line to print_line
        line = vl pdf image baseinfo  weight volume dimensjon unspsc va_2 va_3 
            # erase ^M (\r)
        gsub( /\r/;"";line )
            # print a block
        print line
            # initialisation variables
        vl = pdf = image = baseinfo  = weight = volume = dimensjon = unspsc = va_2 = va_3 = ";"
    }
        # head/title, note that "%12s" format with 12 characters width
    BEGIN { printf ( "%12s; %s; %s; %s; %s; %s; %s; %s; %s; %s;","vl","pdf","image","baseinfo ","weight","volume","dimensjon","unspsc","va_2","va_3" ) }
    /^VL/ { init(); ; vl = sprintf( "%12s; %s; %s; %s; ", $3, $4, $5, $6 ) }
    /^VX;WEIGHT;/ { weight = sprintf( "%s; ", $3 )}
    # .. another conditions
    END { init() }
' file.dat  # > outputfile.csv

进行测试:

cat << end > file.dat
VL;1;1001;Productname 1;Description 1;2;MTR;METER;217883;10000;20180402;1;010206;&10;PRODUCER1;;N;10000;;
VA;2;4044773815245;V;
VA;3;0036453;V;
VL;1;1002;Productname 2;This is product decrtiption for 2 product;2;MTR;METER;140365;10000;20180402;1;010206;&10;PRODUCER1;;N;10000;;
VX;WEIGHT;7500
VX;VOLUME;3249
VX;DIMENSJON;57x57x1000
VA;2;4044773452884;V;
VA;3;0036479;V;
VL;1;1003;Productname 3;Description......;2;MTR;METER;1575;10000;20171006;1;010606;&10;PRODUCER1;;N;10000;;
VX;PDF;1003.pdf
VX;IMAGE;1003.png
VX;BASEINFO;http://127.0.0.1/1003/
VX;WEIGHT;20
VX;DIMENSJON;0x7x0
VX;UNSPSC;26121616
VA;2;7070613017149;V;
VA;3;1000116;V;
end

输出中

      vl; pdf; image; baseinfo ; weight; volume; dimensjon; unspsc; va_2; va_3;
    1001; Productname 1; Description 1; 2; ;;;;;;;;;
    1002; Productname 2; This is product decrtiption for 2 product; 2; ;;;7500; ;;;;;
    1003; Productname 3; Description......; 2; ;;;20; ;;;;;