我有一个看起来像"PC DELL OptiPlex 3010MT i3 3220/2GB/500GB/DVD-RW/FREE DOS / 5Y NBD
Intel i3 3220 (Dual Core, 3.30GHz, 3MB, w/ HD2500 Graphics), 2GB (1x2GB) DDR3 PC3-1600MHz, 500GB HDD SATA III 7200rpm, DVD+/-RW (16x), FREE DOS, Warranty: 5Yr Basic Warranty NBD on site"
所以我需要填充一个html表,然后将其放在.csv文件中进行上传 到目前为止,我已设法使用以下脚本“清理”文件
for f in $(ls *.csv)
do
#fix newline from file
sed -i ':a;{N;s/NBD \n/NBD,/};ba;s/"//g;' "$f"
#fix csv & and remove strings
sed -i 's/"PC/PC/g;s/Core\,/Core/g;s/3\,/3./g;s/3MB\,//g;s/6MB\,//g;s/6MB//g;s/w \///g;s/7,200/7200/g;s/site\"/site/g;s/3MB//g;s/3\,/3\./g;s/w\///g;s/3\,/3\./g;s/Cache\,)/Cache/g;s/ Internal Dell Business Audio Speaker\,//g;' "$f"
#don't know how to remove symbols with sed using awk
awk 'NR==FNR {a[$1]=$2;next} {for ( i in a) gsub(i,a[i])}1' template $f >temp.txt
mv temp.txt $f
done
然后使用此脚本填充html表
#!/bin/bash
for f in $(ls *.csv)
do
#split csv into 1line .csv files
split --additional-suffix=.csv -d -l 1 "$f" output/data_
#populate html file and create .html files
for file in $(ls output/*.csv)
do
IFS=","
while read f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
do
echo "<table cellspacing=\"0\" cellpadding=\"0\" border=\"0\" width=\"100%\"> "
echo "<tbody>"
echo "<tr> "
echo "<td class=\"specsTitle\">Box</td> "
echo "<td class=\"specsDescript stripeBottom\">$f2</td> "
echo "</tr> "
echo "<tr> "
<snip>
done <$file > output/temp.txt
mv output/temp.txt $file.html
done
done
#remove not important .csv
rm output/*.csv
所以此时我在输出文件夹中有几个.html文件
问题是: 1.上述代码有多糟糕? :-) 2.如何将.html文件中的代码放在看起来像这样的.csv文件中
col1,col2,col3,HERE SHOULD BE THE HTML CODE FROM FILE1,col5,
col1,col2,col3,HERE SHOULD BE THE HTML CODE FROM FILE2,col5,
我正在考虑使用模板文件并以某种方式添加几个.html代码。有帮助吗? 亲切的问候
- EDIT-- 这是原始输入 原始输入:
"PC DELL OptiPlex 3010MT i3 3220/2GB/500GB/DVD-RW/FREE DOS / 5Y NBD
Intel i3 3220 (Dual Core, 3.30GHz, 3MB, w/ HD2500 Graphics), 2GB (1x2GB) DDR3 PC3-1600MHz, 500GB HDD SATA III 7200rpm, DVD+/-RW (16x), FREE DOS, Warranty: 5Yr Basic Warranty NBD on site"
"PC DELL OptiPlex 3010MT i5 3470/2GB/500GB/DVD-RW/FREE DOS / 5Y NBD
Intel i5 3470 (Quad Core, 3.20GHz Turbo,6MB, w/ HD2500 Graphics), 4GB (1x4GB) DDR3, PC3-1600MHz, 750GB HDD SATA III 7200rpm, DVD+/-RW (16x), FREE DOS, Warranty: 5Yr Basic Warranty NBD on site"
CSV模板
price,product code, SPECS,other things,
300.00,CODE 2112334, ,OTHER STRINGS,
500.00,CODE 2222222, ,OTHER STRINGS,
所需的.csv输出:
price,product code, SPECS,other things,
300.00,CODE 2112334, <table style="width:300px"><tr><td>Proccessor</td><td>Intel i3 3220 (Dual Core, 3.30GHz</td></tr><tr><td>Memmory</td><td> 2GB (1x2GB) DDR3 PC3-1600MHz</td>tr><td>Hard Disk</td><td>500GB HDD SATA III 7200rpm</td></tr><tr><td>VGA</td><td>HD2500 Graphics</td></tr><tr><td>Warranty</td><td>5Yr Basic Warranty NBD on site</td></tr><tr><td>Ohter features</td><td>THIS IS NOT FROM THE SPECFILE</td></tr><tr><td>Ohter features 2</td><td>THIS IS ALSO NOT FROM THE SPECFILE</td></tr></tr></table>,OTHER STRINGS,
500.00,CODE 2222222, <table style="width:300px"><tr><td>Proccessor</td><td>Intel i5 3470 (Quad Core 3.20GHz)</td></tr><tr><td>Memmory</td><td> 4GB (1x4GB) DDR3 PC3-1600MHz</td>tr><td>Hard Disk</td><td>750GB HDD SATA III 7200rpm</td></tr><tr><td>VGA</td><td>HD2500 Graphics</td></tr><tr><td>Warranty</td><td>5Yr Basic Warranty NBD on site</td></tr><tr><td>Ohter features</td><td>THIS IS NOT FROM THE SPECFILE</td></tr><tr><td>Ohter features 2</td><td>THIS IS ALSO NOT FROM THE SPECFILE</td></tr></tr></table>,OTHER STRINGS,
- 编辑 -
答案 0 :(得分:0)
这是一种不使用paste
实用程序组合输入文件和模板文件的行并在awk
中处理结果而不创建任何临时文件的方法。我使用sed
执行最少的数据清理,足以使其正常工作,但您当然可以用完整的清理命令替换。
#!/bin/bash
# Dummy header for the input file to match the header of the template file.
# Used only to make sure files have same number of lines.
heading="model,processor,speed,cache,graphics,memory,hd,optical,dos,warranty"
# Create input for awk that has the lines of input and template side by side.
paste -d ',' - template.csv <<< "$(echo $heading; sed -e 'N;s/ *\n/,/g' -e 's/"//g' input.csv)" | awk -F ',' '
## awk portion
# First line: print just template.csv header (not dummy header).
NR == 1 { for (i=11; i<NF; ++i) printf("%s,", $i); print(""); next }
# Print each line, starting with the fields from template.csv,
# then the HTML populated with values form input.csv,
# and ending with the last fields form template.csv.
{ print($11","$12","" <table style=\"width:300px\"><tr><td>Processor</td><td>" $2 $3 ")</td></tr><tr><td>Memory</td><td>" $6 "</td></tr><tr><td>Hard Disk</td><td>" $7 "</td></tr><tr><td>VGA</td><td>" $5 "</td></tr><tr><td>Warranty</td><td>" $10 "</td></tr><tr><td>Other features</td><td>THIS IS NOT FROM THE SPECFILE</td></tr><tr><td>Other features 2</td><td>THIS IS ALSO NOT FROM THE SPECFILE</td></tr></table>," $14 ","); }'
这实际上已经是一线了:
paste -d ',' - template.csv <<< "$(echo "model,processor,speed,cache,graphics,memory,hd,optical,dos,warranty"; sed -e 'N;s/ *\n/,/g' -e 's/"//g' input.csv)" | awk -F ',' 'NR == 1 { for (i=11; i<NF; ++i) printf("%s,", $i); print(""); next } { print($11","$12","" <table style=\"width:300px\"><tr><td>Processor</td><td>" $2 $3 ")</td></tr><tr><td>Memory</td><td>" $6 "</td></tr><tr><td>Hard Disk</td><td>" $7 "</td></tr><tr><td>VGA</td><td>" $5 "</td></tr><tr><td>Warranty</td><td>" $10 "</td></tr><tr><td>Other features</td><td>THIS IS NOT FROM THE SPECFILE</td></tr><tr><td>Other features 2</td><td>THIS IS ALSO NOT FROM THE SPECFILE</td></tr></table>," $14 ","); }'
此解决方案假定您已将输入组合到单个文件中,并将其清除以匹配给定的格式。此外,它假定聚合输入文件与模板文件具有相同数量的数据行。您还必须确保输入文件中没有额外/缺少的逗号,因为逗号被此awk
脚本用作字段分隔符。