我是jsoup的新手和解析的东西,所以如果您需要更多信息以便能够回答我的问题,请告诉我!
我有这个表,我想用Java中的Jsoup解析。我只想得到以下文字:
“B S Computer Science,CS(2012-2014)”
从表格的这一部分
<h3>Fahran S Kamili (fsk226)</h3>
<div>
10 Degree Audit Requests Returned.
</div>
<table>
<thead>
<tr>
<!-- *nrfkh - 9/2012: [degaudt-634]* -->
<th colspan="8">Degree Audits Requested</th>
<!-- *end nrfkh - 9/2012: [degaudt-634]* -->
</tr>
<tr>
<th>Rerun</th>
<!-- *nrfkh - 9/2012: [degaudt-634]* -->
<!-- *end nrfkh - 9/2012: [degaudt-634]* -->
<th>Request Created</th>
<!-- *nrfkh - 9/2012: [degaudt-634]* -->
<!-- *end nrfkh - 9/2012: [degaudt-634]* -->
<th>Audit Type</th>
<th>Program</th>
<th>Courses Requested</th>
<th>Request Status</th>
<th>Audit ID</th>
<th>Delete Option</th>
</tr>
</thead>
<tbody><tr>
<td>
<a href="https://utdirect.utexas.edu/apps/degree/audits/requests/student_individual/?form-0-eid=fsk226&form-0-name=Fahran%20S%20Kamili&form-0-begin_ccyy=2012&form-0-degree_plan=ESC%20SS%20CS&form-0-minor=¤t=X&future=&planned=&form-TOTAL_FORMS=20&form-INITIAL_FORMS=0&form-MAX_NUM_FORMS=&rerun=" target="_blank">Rerun</a>
</td>
<!-- *nrfkh - 9/2012: [degaudt-634]* -->
<!-- *end nrfkh - 9/2012: [degaudt-634]* -->
<td>
12/20/2013
05:06 PM
</td>
<!-- *nrfkh - 9/2012: [degaudt-634]* -->
<!-- *end nrfkh - 9/2012: [degaudt-634]* -->
<td>
Normal
</td>
<td>
B S Computer Science, CS
(2012-2014)
</td>
该表实际上是长时间拉伸但是包含只是彼此的兄弟姐妹(所以我假设如果我可以得到这个文本,我也可以轻松获得其他文本。)
答案 0 :(得分:0)
如果我将HTML
的部分保存到文件并通过jsoup
进行解析,我会尝试打印遇到的所有td
个元素,因为这就是您所追求的:
public static void main(String... args) throws IOException {
File input = new File("C:/users/XYZ/desktop/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "");
Elements tds = doc.getElementsByTag("td");
for (Element td : tds) {
System.out.println(td.text());
}
}
<强>输出:强>
Rerun
12/20/2013 05:06 PM
Normal
B S Computer Science, CS (2012-2014)