我有一个庞大的文本文件,我需要将其转换为CSV文件,以便将其导入MySQL数据库。
文本文件如下所示:
原始文本文件
VL;1;1001;Productname 1;Description 1;2;MTR;METER;217883;10000;20180402;1;010206;&10;PRODUCER1;;N;10000;;
VA;2;4044773815245;V;
VA;3;0036453;V;
VL;1;1002;Productname 2;This is product decrtiption for 2 product;2;MTR;METER;140365;10000;20180402;1;010206;&10;PRODUCER1;;N;10000;;
VX;WEIGHT;7500
VX;VOLUME;3249
VX;DIMENSJON;57x57x1000
VA;2;4044773452884;V;
VA;3;0036479;V;
VL;1;1003;Productname 3;Description......;2;MTR;METER;1575;10000;20171006;1;010606;&10;PRODUCER1;;N;10000;;
VX;PDF;1003.pdf
VX;IMAGE;1003.png
VX;BASEINFO;http://127.0.0.1/1003/
VX;WEIGHT;20
VX;DIMENSJON;0x7x0
VX;UNSPSC;26121616
VA;2;7070613017149;V;
VA;3;1000116;V;
通缉结果
我需要将其转换为CSV文件,如下所示:
type; Productnumber; Productname; Description; measurement_unit; price_unit; price_unit_txt; price; crowd; price_date; status; block_number; discount_group; manufac; type; stocked; sales_package; discount; price_type; PDF; IMAGE; baseinfo; WEIGHT; VOLUME; dimensjon; UNSPSC; va_2; va_3;
1; 1001; Productname 1; Description 1; 2; MTR; METER; 217883; 10000; 20180402 1; 010206; &10; PRODUCER1; ; N; 10000; ; ; ; ; ; ; ; ; ; 4044773815245; 0036453;
1; 1002; Productname 2; Description 2; 2; MTR; METER; 140365; 10000; 20180402; 1; 010206; &10; PRODUCER2 ; N; 10000; ; ; ; ; ; 7500; 3249; 57x57x1000; ; 4044773452884; 0036479;
1; 1003; Productname 3; Description ABC 3; 2; MTR; METER; 1575; 10000; 20171006; 1; 010606; &10; PRODUCER3; ; N; 10000; ; ; 1003.pdf; 1003.png; http://127.0.0.1/1003/; 20; ; 0x7x0; 26121616; 7070613017149; 1000116;
原始文件的说明
第一个产品系列始终以VL开头,然后按此顺序继续:
type;Productnumber;Productname;Description;measurement_unit;price_unit;price_unit_txt;price;crowd;price_date;status;block_number;discount_group;manufac;type;stocked;sales_package;discount;price_type;
PDF is always on a new line starting with VX;PDF;
IMAGE is always on a new line starting with VX;IMAGE;
baseinfo is always on a new line starting with VX;BASEINFO;
WEIGHT is always on a new line starting with VX;WEIGHT;
VOLUME is always on a new line starting with VX;VOLUME;
dimensjon is always on a new line starting with VX;DIMENSJON;
UNSPSC is always on a new line starting with VX;UNSPSC;
va_2 is always on a new line starting with VA;2;
va_3 is always on a new line starting with VA;3;
希望有人可以帮助我解决这个问题:)
答案 0 :(得分:1)
一种可能的方式(不是解决方案)
#!/bin/bash
awk -F';' '
function init() {
# formation line to print_line
line = vl pdf image baseinfo weight volume dimensjon unspsc va_2 va_3
# erase ^M (\r)
gsub( /\r/;"";line )
# print a block
print line
# initialisation variables
vl = pdf = image = baseinfo = weight = volume = dimensjon = unspsc = va_2 = va_3 = ";"
}
# head/title, note that "%12s" format with 12 characters width
BEGIN { printf ( "%12s; %s; %s; %s; %s; %s; %s; %s; %s; %s;","vl","pdf","image","baseinfo ","weight","volume","dimensjon","unspsc","va_2","va_3" ) }
/^VL/ { init(); ; vl = sprintf( "%12s; %s; %s; %s; ", $3, $4, $5, $6 ) }
/^VX;WEIGHT;/ { weight = sprintf( "%s; ", $3 )}
# .. another conditions
END { init() }
' file.dat # > outputfile.csv
进行测试:
cat << end > file.dat
VL;1;1001;Productname 1;Description 1;2;MTR;METER;217883;10000;20180402;1;010206;&10;PRODUCER1;;N;10000;;
VA;2;4044773815245;V;
VA;3;0036453;V;
VL;1;1002;Productname 2;This is product decrtiption for 2 product;2;MTR;METER;140365;10000;20180402;1;010206;&10;PRODUCER1;;N;10000;;
VX;WEIGHT;7500
VX;VOLUME;3249
VX;DIMENSJON;57x57x1000
VA;2;4044773452884;V;
VA;3;0036479;V;
VL;1;1003;Productname 3;Description......;2;MTR;METER;1575;10000;20171006;1;010606;&10;PRODUCER1;;N;10000;;
VX;PDF;1003.pdf
VX;IMAGE;1003.png
VX;BASEINFO;http://127.0.0.1/1003/
VX;WEIGHT;20
VX;DIMENSJON;0x7x0
VX;UNSPSC;26121616
VA;2;7070613017149;V;
VA;3;1000116;V;
end
输出中
vl; pdf; image; baseinfo ; weight; volume; dimensjon; unspsc; va_2; va_3;
1001; Productname 1; Description 1; 2; ;;;;;;;;;
1002; Productname 2; This is product decrtiption for 2 product; 2; ;;;7500; ;;;;;
1003; Productname 3; Description......; 2; ;;;20; ;;;;;