将html代码与.csv文件合并

时间:2014-02-13 08:25:01

标签: bash csv sed awk

我有一个看起来像"PC DELL OptiPlex 3010MT i3 3220/2GB/500GB/DVD-RW/FREE DOS / 5Y NBD
Intel i3 3220 (Dual Core, 3.30GHz, 3MB, w/ HD2500 Graphics), 2GB (1x2GB) DDR3 PC3-1600MHz, 500GB HDD SATA III 7200rpm, DVD+/-RW (16x), FREE DOS, Warranty: 5Yr Basic Warranty NBD on site"

的spec文件

所以我需要填充一个html表,然后将其放在.csv文件中进行上传 到目前为止,我已设法使用以下脚本“清理”文件

 for f in $(ls *.csv)
 do
 #fix newline from file
 sed -i ':a;{N;s/NBD   \n/NBD,/};ba;s/"//g;' "$f" 

 #fix csv & and remove strings
 sed -i 's/"PC/PC/g;s/Core\,/Core/g;s/3\,/3./g;s/3MB\,//g;s/6MB\,//g;s/6MB//g;s/w   \///g;s/7,200/7200/g;s/site\"/site/g;s/3MB//g;s/3\,/3\./g;s/w\///g;s/3\,/3\./g;s/Cache\,)/Cache/g;s/ Internal Dell Business Audio Speaker\,//g;' "$f"

#don't know how to remove symbols with sed using awk
awk 'NR==FNR {a[$1]=$2;next} {for ( i in a) gsub(i,a[i])}1' template $f >temp.txt
mv temp.txt $f
done

然后使用此脚本填充html表

#!/bin/bash

for f in $(ls *.csv)
do
#split csv into 1line .csv files
split --additional-suffix=.csv -d -l 1 "$f" output/data_

#populate html file and create .html files
for file in $(ls output/*.csv)
do

IFS=","
while read f1 f2 f3 f4 f5 f6 f7 f8 f9 f10
do

echo "<table cellspacing=\"0\" cellpadding=\"0\" border=\"0\" width=\"100%\"> " 
echo "<tbody>"  
echo "<tr>  "   
echo "<td class=\"specsTitle\">Box</td> "
echo "<td class=\"specsDescript stripeBottom\">$f2</td> "
echo "</tr>     "   
echo "<tr>  "   
<snip>
done <$file  > output/temp.txt
mv output/temp.txt $file.html
done
done
#remove not important .csv
rm output/*.csv

所以此时我在输出文件夹中有几个.html文件

问题是: 1.上述代码有多糟糕? :-) 2.如何将.html文件中的代码放在看起来像这样的.csv文件中

 col1,col2,col3,HERE SHOULD BE THE HTML CODE FROM FILE1,col5,
 col1,col2,col3,HERE SHOULD BE THE HTML CODE FROM FILE2,col5,  

我正在考虑使用模板文件并以某种方式添加几个.html代码。有帮助吗? 亲切的问候

- EDIT-- 这是原始输入 原始输入:

 "PC DELL OptiPlex 3010MT i3 3220/2GB/500GB/DVD-RW/FREE DOS / 5Y NBD   
  Intel i3 3220 (Dual Core, 3.30GHz, 3MB, w/ HD2500 Graphics), 2GB (1x2GB) DDR3 PC3-1600MHz, 500GB HDD SATA III 7200rpm, DVD+/-RW (16x), FREE DOS, Warranty: 5Yr Basic Warranty NBD on site"
  "PC DELL OptiPlex 3010MT i5 3470/2GB/500GB/DVD-RW/FREE DOS / 5Y NBD   
   Intel i5 3470 (Quad Core, 3.20GHz Turbo,6MB, w/ HD2500 Graphics), 4GB (1x4GB)      DDR3, PC3-1600MHz, 750GB HDD SATA III 7200rpm, DVD+/-RW (16x), FREE DOS, Warranty: 5Yr   Basic Warranty NBD on site"

CSV模板

  price,product code, SPECS,other things,
  300.00,CODE 2112334,    ,OTHER STRINGS,
  500.00,CODE 2222222,    ,OTHER STRINGS,

所需的.csv输出:

  price,product code, SPECS,other things,
  300.00,CODE 2112334, <table style="width:300px"><tr><td>Proccessor</td><td>Intel i3 3220 (Dual Core, 3.30GHz</td></tr><tr><td>Memmory</td><td> 2GB (1x2GB) DDR3 PC3-1600MHz</td>tr><td>Hard Disk</td><td>500GB HDD SATA III 7200rpm</td></tr><tr><td>VGA</td><td>HD2500 Graphics</td></tr><tr><td>Warranty</td><td>5Yr Basic Warranty NBD on site</td></tr><tr><td>Ohter features</td><td>THIS IS NOT FROM THE SPECFILE</td></tr><tr><td>Ohter features 2</td><td>THIS IS ALSO NOT FROM THE SPECFILE</td></tr></tr></table>,OTHER STRINGS,
  500.00,CODE 2222222, <table style="width:300px"><tr><td>Proccessor</td><td>Intel i5 3470 (Quad Core 3.20GHz)</td></tr><tr><td>Memmory</td><td> 4GB (1x4GB) DDR3 PC3-1600MHz</td>tr><td>Hard Disk</td><td>750GB HDD SATA III 7200rpm</td></tr><tr><td>VGA</td><td>HD2500 Graphics</td></tr><tr><td>Warranty</td><td>5Yr Basic Warranty NBD on site</td></tr><tr><td>Ohter features</td><td>THIS IS NOT FROM THE SPECFILE</td></tr><tr><td>Ohter features 2</td><td>THIS IS ALSO NOT FROM THE SPECFILE</td></tr></tr></table>,OTHER STRINGS,

- 编辑 -

1 个答案:

答案 0 :(得分:0)

这是一种不使用paste实用程序组合输入文件和模板文件的行并在awk中处理结果而不创建任何临时文件的方法。我使用sed执行最少的数据清理,足以使其正常工作,但您当然可以用完整的清理命令替换。

#!/bin/bash

# Dummy header for the input file to match the header of the template file.
# Used only to make sure files have same number of lines.
heading="model,processor,speed,cache,graphics,memory,hd,optical,dos,warranty"

# Create input for awk that has the lines of input and template side by side.
paste -d ',' - template.csv <<< "$(echo $heading; sed -e 'N;s/ *\n/,/g' -e 's/"//g' input.csv)" | awk -F ',' '

  ## awk portion
  # First line: print just template.csv header (not dummy header).
  NR == 1 { for (i=11; i<NF; ++i) printf("%s,", $i); print(""); next }

  # Print each line, starting with the fields from template.csv,
  # then the HTML populated with values form input.csv,
  # and ending with the last fields form template.csv.
  { print($11","$12","" <table style=\"width:300px\"><tr><td>Processor</td><td>" $2 $3 ")</td></tr><tr><td>Memory</td><td>" $6 "</td></tr><tr><td>Hard Disk</td><td>" $7 "</td></tr><tr><td>VGA</td><td>" $5 "</td></tr><tr><td>Warranty</td><td>" $10 "</td></tr><tr><td>Other features</td><td>THIS IS NOT FROM THE SPECFILE</td></tr><tr><td>Other features 2</td><td>THIS IS ALSO NOT FROM THE SPECFILE</td></tr></table>," $14 ","); }'

这实际上已经是一线了:

paste -d ',' - template.csv <<< "$(echo "model,processor,speed,cache,graphics,memory,hd,optical,dos,warranty"; sed -e 'N;s/ *\n/,/g' -e 's/"//g' input.csv)" | awk -F ',' 'NR == 1 { for (i=11; i<NF; ++i) printf("%s,", $i); print(""); next } { print($11","$12","" <table style=\"width:300px\"><tr><td>Processor</td><td>" $2 $3 ")</td></tr><tr><td>Memory</td><td>" $6 "</td></tr><tr><td>Hard Disk</td><td>" $7 "</td></tr><tr><td>VGA</td><td>" $5 "</td></tr><tr><td>Warranty</td><td>" $10 "</td></tr><tr><td>Other features</td><td>THIS IS NOT FROM THE SPECFILE</td></tr><tr><td>Other features 2</td><td>THIS IS ALSO NOT FROM THE SPECFILE</td></tr></table>," $14 ","); }'

此解决方案假定您已将输入组合到单个文件中,并将其清除以匹配给定的格式。此外,它假定聚合输入文件与模板文件具有相同数量的数据行。您还必须确保输入文件中没有额外/缺少的逗号,因为逗号被此awk脚本用作字段分隔符。