我有这个奇怪的问题,我在我的智慧结束。也许一双新鲜的眼睛可以解决这个问题!
我正在使用jSoup来解析HTML文件问题是即使在写入新文件时,也会将表集输出到文件3-4次。它第一次作为.csv文件中的一条直线输出,但每隔一次它的格式完全符合我的要求。但是我很明显第一次想要它,并且第一次有这样的感觉!
我的代码:
Document doc = new Document(file.toString());
doc = Jsoup.parse(file, null);
Elements tables = doc.select("table");
for (Element table: tables) {
Elements rows = table.select("tr");
for (Element row: rows) {
Elements cells = row.getElementsByTag("td");
StringBuffer values = new StringBuffer();
for (Element cell: cells) {
String cellText = cell.text();
cellText = cellText.replaceAll(",", "");
cellText = cellText.replaceAll("£", ",£");
cellText = cellText.replaceAll(",£", "£");
System.out.println(cellText);
values.append(cellText + ",");
}
System.out.println(values.toString());
addToFile(values + ",");
}
}
// add new data to mySNMPResults file
private static void addToFile(String myString) { // add newest entry to .csv
// file
try {
BufferedWriter out = new BufferedWriter(new FileWriter(
"MyParsedDOMTree.csv", true));
out.write(myString + "\n");
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
它也可能只是一个复杂的HTML文件,各种表互相嵌套的情况,但我不知道这是如何导致数字数据表只出现一次输出三次...
修改
HTML片段:
<tr bgcolor = "#EEEEEE" height = 20 >
<td width = 15% >
<font face="tahoma" size="1">
Dept '<b>Food Incl Vat</b>'
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£688.95
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£642.60
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£767.95
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£3,007.00
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£1,525.60
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£1,970.40
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£353.00
</td>
<td width = 1%></td><td width
= 14% align = right bgcolor = "#DFDFDF"><font face="tahoma" size="1" color = '#444444'>
<b>£8,955.50</b></td>
</tr>
答案 0 :(得分:1)
编辑:抱歉代码中有错误。现在修好了。
我真的没有足够的代码来进行可靠的猜测,但我不确定为什么你要尝试获取表的大小然后经过那个表多次.size()得到你(我猜3-4)。你想要找到表的根,然后在根下将是表的名称(表的类名应该是相同的),然后在每个表中搜索你想要找到的任何内容。也许一些代码会有所帮助:)
HTML:
<ul class="ListOfTables">
<li class="TABLE">
<span class="item">
<li class="TABLE">
<span class="item">
<li class="TABLE">
<span class="item">
<li class="TABLE">
<span class="item">
Java代码:
public void searchForItems(Document doc)
{
Elements tables = doc.select("li[class=TABLE]");
for (Element table : tables)
{
String item;
Elements itemsInTable = table.select("span[class=item]");
item = itemsIntTable.text();
//Write the item to file. Depending on what is in your table, you might
//have to write a more complex scan. Looking for things like attributes
}
}