我需要从html表中提取数据并将其保存在csv文件中。有没有简单的方法来获取bash或php中表的属性中的所有信息?
这是代码
<html><head> <link rel=STYLESHEET href="/XPIcons/style.css" type="text/css">
<title>Control 20 November 2014</title>
</head>
<body
>
<table width="100%" cellapdding="0" cellspacing="0">
<tr><td WIDTH="100%" class="username">xxxx<br><font color=#A4A6A0>IDIOMABASE</font> </td>
<td> </td><td bgcolor=#FFFFFF rowspan="3" align="right"><img src="/XPIcons/logo.jpg"></td></tr>
<tr><td width="100%" align="top" class="guio"><img src="/XPIcons/guion_verde.jpg"></td></tr>
<tr><td width="100%" align="top" class="title">CONTROL 20 NOVEMBER 2014
<br><font color=#B7D30C size="1px"><SCRIPT LANGUAGE="JavaScript" SRC="/XPIcons /calendar.js"></SCRIPT></font>
</td></tr>
</table>
<P>
<script language="JavaScript">
function doNothing(){
}
function ShowData(){
var obj = "QUALCTRL.ShowDataTD?p_date="+p_date.value;
location.href=obj;
}
</script>
<script language="JavaScript">
function test(formu)
{
error=formu.p_date.value==""?"ErrorDate\n":"";
if (error != "")
alert(error);
else
formu.submit();
}
</script>
<table>
<td>Fecha</td><td>
<input type="text" id="p_date" name="p_date" value="20/11/2014" onblur="Compruebap_fecha(formu.p_date);">
<A HREF="javascript:doNothing()" onClick="var obj=document.getElementById('p_date'); setDateField(obj);top.newWin=window.open('/XPIcons/calendar.html', 'cal', 'dependent=yes, resizable=yes, width=210, height=230, screenX=200, screenY=300, titlebar=no')">
<IMG SRC="/XPIcons/calendar.gif" BORDER=0></A><font size=1>Ver calendario</font>
(dd/mm/YYYY)
</td></table>
<input type="button" class="btn" onClick="javascript:RecarregaPlana();" value="Cambia de Dia >>">
<P>
<table border="1%">
<tr><td class="fila_blanca">Población</td>
<td class="fila_mesgris">MAX</td>
<td class="fila_mesgris">MIN</td>
<td class="fila_menysgris">MASS MAX</td>
<td class="fila_menysgris">MASS MIN</td>
<td class="fila_mesgris">MERGE MAX</td>
<td class="fila_mesgris">MERGE MIN</td>
<td class="fila_menysgris">MOS MAX</td>
<td class="fila_menysgris">MOS MIN</td>
<td class="fila_blanca">DIF MAX</td>
<td class="fila_blanca">DIF MIN</td>
<td class="fila_blanca">DIF MAX MERGE</td>
<td class="fila_blanca">DIF MIN MERGE</td>
<td class="fila_blanca">DIF MAX MOS</td>
<td class="fila_blanca">DIF MIN MOS</td>
</tr>
<tr>
<td class="fila_blanca">Palermo</td>
<td class="fila_mesgris">20</td>
<td class="fila_mesgris">11</td>
<td class="fila_menysgris">21</td>
<td class="fila_menysgris">10</td>
<td class="fila_mesgris">20</td>
<td class="fila_mesgris">17</td>
<td class="fila_menysgris">20</td>
<td class="fila_menysgris">9</td>
<td class="fila_blanca">-1</td>
<td class="fila_blanca">1</td>
<td class="fila_mesgris">0</td>
<td class="fila_mesgris">-6</td>
<td class="fila_menysgris">0</td>
<td class="fila_menysgris">2</td>
</tr>
<tr>
<td class="fila_blanca">Bergamo</td>
<td class="fila_mesgris"></td>
<td class="fila_mesgris"></td>
<td class="fila_menysgris">16</td>
<td class="fila_menysgris">7</td>
<td class="fila_mesgris">17</td>
<td class="fila_mesgris">7</td>
<td class="fila_menysgris">17</td>
<td class="fila_menysgris">7</td>
<td class="fila_blanca"></td>
<td class="fila_blanca"></td>
<td class="fila_mesgris"></td>
<td class="fila_mesgris"></td>
<td class="fila_menysgris"></td>
<td class="fila_menysgris"></td>
</tr>
<tr>
<td class="fila_blanca">Rome</td>
<td class="fila_mesgris"></td>
<td class="fila_mesgris"></td>
<td class="fila_menysgris">19</td>
<td class="fila_menysgris">16</td>
<td class="fila_mesgris">19</td>
<td class="fila_mesgris">14</td>
<td class="fila_menysgris">19</td>
<td class="fila_menysgris">14</td>
<td class="fila_blanca"></td>
<td class="fila_blanca"></td>
<td class="fila_mesgris"></td>
<td class="fila_mesgris"></td>
<td class="fila_menysgris"></td>
<td class="fila_menysgris"></td>
</tr>
</table>
<SCRIPT>
function openSearch() {
window.open('XPSearch.Search', 'XPSearch', 'scrollbars=yes,resizable=yes,toolbar=yes,location=yes,status=yes,width=550,height=500,screenX=550,screenY=500'); }
function doNothing() {
}
</SCRIPT>
<P>
<table width="100%" cellspacing="0">
<tr><td class="pie" width="100%"><a href="XPMenuPrincipal.menu"><b>Menú principal</b></a> </td><td bgcolor=#FCFCFA><a href="javascript:doNothing()" onClick="javascript:openSearch()"><img border="0" src="/XPIcons/search.jpg" ></a></td> <td><marquee hspace=147></marquee></td></table>
</body></html>
我想得到一个像这样的csv:
Población,MAX,MIN,MASS MAX,MASS MIN,MERGE MAX,MERGE MIN,MOS MAX,MOS MIN,DIF MAX,DIF MIN ,DIF MAX MERGE,DIF MIN MERGE,DIF MAX MOS,DIF MIN MOS
Palermo,20,11,21,10,20,17,20,9,-1,1,0,-6,0,2
Bergamo,,,16,7,17,7,17,7,,,,
Rome,,,19,16,19,14,19,14,,,,
答案 0 :(得分:1)
这可以是一种方式:
awk -F'">|<' -v OFS=","
'NF>3{if (r) {r=r OFS $3} else r=$3}
/tr/ {print r; r=""}' file
您的样本输入:
$ awk -F'">|<' -v OFS="," 'NF>3{if (r) {r=r OFS $3} else r=$3} /tr/ {print r; r=""}' a
td class="fila_blanca
MAX,MIN,MASS MAX,MASS MIN,MERGE MAX,MERGE MIN,MOS MAX,MOS MIN,DIF MAX,DIF MIN,DIF MAX MERGE,DIF MIN MERGE,DIF MAX MOS,DIF MIN MOS
Palermo,20,11,21,10,20,17,20,9,-1,1,0,-6,0,2
Bergamo,,,16,7,17,7,17,7,,,,,,
Rome,,,19,16,19,14,19,14,,,,,,
-F'">|<'
将输入字段分隔符设置为">
或<
。这样,我们可以轻松捕获标签内的值,而无需进一步处理。-v OFS=","
将输出字段分隔符设置为逗号。NF>3{if (r) {r=r OFS $3} else r=$3}
如果记录包含3个以上的字段,请将第3个字段存储在变量r
中。这将继续添加内容,直到找到<tr
... /tr/ {print r; r=""}
当我们打印内容并清空变量以开始处理下一个块时。