我是groovy xml解析的新手。我试图解析下面的xml文件
<font face=Tahoma size=2>
Team,<br/><br/> Please find below the test summary details for the 'Test' execution.<br/><br/><b><U>Transaction Summary Table:</U></b><br/><br/>
<table border=1 CELLPADDING =3 style='font-family:Tahoma;font-size:12'>
<tr>
<b>
<th bgcolor=#C0C0C0> TransactionName </th>
<th bgcolor=#C0C0C0> AverageLatency </th>
<th bgcolor=#C0C0C0> MinimumLatency </th>
<th bgcolor=#C0C0C0> MaximumLatency </th>
<th bgcolor=#C0C0C0> AverageElapsedTime </th>
<th bgcolor=#C0C0C0> MinimumElapsedTime </th>
<th bgcolor=#C0C0C0> MaximumElapsedTime </th>
<th bgcolor=#C0C0C0> TotalCount </th>
<th bgcolor=#C0C0C0> PassPercentage </th>
</b>
</tr>
<tr>
<td>1 /aumentum/</td>
<td>
<center>1648.0</center>
</td>
<td>
<center>1240</center>
</td>
<td>
<center>2900</center>
</td>
<td>
<center>1907.0</center>
</td>
<td>
<center>1495</center>
</td>
<td>
<center>3140</center>
</td>
<td>
<center>45</center>
</td>
<td>
<center>100.0</center>
</td>
</tr>
<tr>
<td>T01_Aumentum_Home</td>
<td>
<center>6.0</center>
</td>
<td>
<center>1</center>
</td>
<td>
<center>10</center>
</td>
<td>
<center>1956.0</center>
</td>
<td>
<center>1490</center>
</td>
<td>
<center>3806</center>
</td>
<td>
<center>213</center>
</td>
<td>
<center>0.0</center>
</td>
</tr>
</tbody>
</table>
<br/><br/>Thanks,<br/>Performance Team.
</font>
<br/><br/>
预期结果:
[{
"transaction name":"1 /aumentum/",
"AverageLatency ":"1648.0",
"Minimum latency":"1240",
"MaximumLatency ":"2900",
"AverageElapsedTime":"1907.0",
"MinimumElapsedTime":"1495",
"MaximumElapsedTime":"3140",
"TotalCount":"45",
"PassPercentage":"100.0"
},
{
"transaction name": "1 /aumentum/",
"AverageLatency ":"1648.0",
"Minimum latency":"1240",
"MaximumLatency ":"2900",
"AverageElapsedTime":"1907.0",
"MinimumElapsedTime":"1495",
"MaximumElapsedTime":"3140",
"TotalCount":"45",
"PassPercentage":"100.0"
}]
我让第一个孩子使用docParser.getElementsByTag("tr").first()
这是我得到的错误:
Exception thrown
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at org.jsoup.select.Elements.get(Elements.java:519)
at org.jsoup.nodes.Element.child(Element.java:174)
at org.jsoup.nodes.Element$child$0.call(Unknown Source)
at CommonUtils.parseLRHTMLReport(jmeteragent.groovy:304)
at CommonUtils$parseLRHTMLReport.call(Unknown Source)
这是我到目前为止所做的:
def transactiondetails12 = null
def iterator12 = 0
int count1 = 0
def violcounts = 0
def violations = null;
tmpElement = docParser.getElementsByTag("tr").first()
println tmpElement.children()
// tmpElement= tmpElement.child(0)
// println "#########tmpElement#########:" +tmpElement
for (element in tmpElement.children()) {
if (iterator12 == 0) {
// transactiondetails1 = "<table border=1 CELLPADDING =3 style='font-family:Tahoma;font-size:12'><tr><b><th bgcolor=#C0C0C0>" +
element.child(0).text().trim() + "</th><th bgcolor=#C0C0C0>" + element.child(2).text().trim() + "</th><th bgcolor=#C0C0C0>" +
element.child(3).text().trim() + "</th><th bgcolor=#C0C0C0>" + element.child(4).text().trim() + "</th></b></tr>"
iterator12 = 1;
count1++;
// println "nqwlieufrh 2938ry `9p23dhWCDNJ p3fu89 Q2390RUD"+transactiondetails1
} else {
count1++;
if (count1 <= 5) {
// println "iterator1iterator1iterator1iterator1"+iterator1++
transactiondetails12 = transactiondetails12 + "<tr><td>" + element.child(0).text().trim() + "</td><td><center>" +
element.child(2).text().trim() + "</center></td><td><center>" +
element.child(3).text().trim() + "</center></td><td><center>" +
element.child(4).text().trim()
println "transactiondetails12" + transactiondetails12
// println "3215463654156436212315465123011482145634217225445622341"+element.child(4).text().trim()
String violation1 = element.child(1).text()
// violation=Integer.valueOf(violation1)
// violation=Integer.parseInt(violation1)
// if(violation1>=0)
if (violation1.length() > 0) {
violcounts++
}
}
}
}
我不知道如何映射tmpElement.children()
值。对此提出任何建议都会有所帮助。提前谢谢。
答案 0 :(得分:1)
您提供的示例使用的jsoup库对HTML DOM操作很有用。问题的解决方案是使用正确的选择器来提取数据。
请考虑以下示例:
def headers = docParser.select("tr > th").collect { it.text() }
def result = []
docParser.select("tr:has(td)").each { tr ->
def obj = [:]
tr.select("td").eachWithIndex { Element td, int i ->
obj[headers[i]] = td.text()
}
result << obj
}
println JsonOutput.prettyPrint(JsonOutput.toJson(result))
docParser.select("tr > th").collect { it.text() }
收集表标题并将其存储为有序的List<String>
docParser.select("tr:has(td)")
选择包含数据的所有行(不包括表格标题)tr.select("td").eachWithIndex
在每一行内迭代,收集数据并通过索引i
输出:
[
{
"TransactionName": "1 /aumentum/",
"AverageLatency": "1648.0",
"MinimumLatency": "1240",
"MaximumLatency": "2900",
"AverageElapsedTime": "1907.0",
"MinimumElapsedTime": "1495",
"MaximumElapsedTime": "3140",
"TotalCount": "45",
"PassPercentage": "100.0"
},
{
"TransactionName": "T01_Aumentum_Home",
"AverageLatency": "6.0",
"MinimumLatency": "1",
"MaximumLatency": "10",
"AverageElapsedTime": "1956.0",
"MinimumElapsedTime": "1490",
"MaximumElapsedTime": "3806",
"TotalCount": "213",
"PassPercentage": "0.0"
}
]
在这里,您可以找到我用于试验您的示例的完整Groovy脚本:https://gist.github.com/wololock/651a536dff4e104ebba0eef69d4ac3ea
我希望它有所帮助。