使用Jsoup提取几个div内的表

时间:2014-06-18 19:52:49

标签: java html jsoup

我正在尝试使用jsoup,以便能够访问嵌入在html页面的多个div中的表。该表位于外部分区,ID为#34; content-top"。我将把内部div分配给表格:content-top - >中心 - > right-right-col - >结果。

在div结果下;是桌子。这是我想要访问的表,我需要遍历哪些行并打印出其中包含的数据。下面是我一直试图使用的java代码,但没有结果:

Document doc = Jsoup.connect("http://www.calculator.com/#").data("express", "sin(x)").data("calculate","submit").post();

// give the application time to calculate result before retrieving result from results table

try {                                  
Thread.sleep(10000); 
} 
catch(InterruptedException ex) 
{
Thread.currentThread().interrupt();
}

Elements content = doc.select("div#result") ;
Element tables = content.get(0) ;
Elements table_rows = tables.select("tr") ;
Iterator iterRows = table_rows.iterator();

while (iterRows.hasNext()) {

Element tr = (Element)iterRows.next();
Elements table_data = tr.select("td");
Iterator iterData = table_data.iterator();

int tdCount = 0;
String f_x_value = null;
String result = null;

// process new line
while (iterData.hasNext()) {

Element td = (Element)iterData.next();
switch (tdCount++) {
case 1:
f_x_value = td.text();
f_x_value = td.select("a").text();
break;

case 2:
result = td.text();
result = td.select("a").text();
break;          
}
}
System.out.println(f_x_value + "   " + result ) ;
} 

上面的代码崩溃了,几乎没有我想做的事情。请任何人请帮助我!!!

2 个答案:

答案 0 :(得分:0)

该页面不会直接在div中为您提供一个id为“result”的表格。它将ajax类用于php文件并完成该过程。所以你需要做的就是先建立一个类似

的json
{"expression":"sin(x)","intVar":"x","upperBound":"","lowerBound":"","simplifyExpressions":false,"latex":"\\displaystyle\\int\\limits^{}_{}{\\sin\\left(x\\right)\\, \\mathrm{d}x}"}

expression键包含您要评估的表达式,latexmathjax表达式,然后将其发布到int.php。这需要两个参数,即q,即上面的json和v,它们似乎是一个常数值1380119311。我不明白这是什么。

现在这将返回类似

的响应
<html>
 <head></head>
 <body>
  <table class="round"> 
   <tbody>
    <tr class="">
     <th>$f(x) =$</th>
     <td>$\sin\left(x\right)$</td>
    </tr> 
    <tr class="sep odd">
     <th>$\displaystyle\int{f(x)}\, \mathrm{d}x =$</th>
     <td>$-\cos\left(x\right)$</td>
    </tr> 
   </tbody>
  </table> 
  <!-- Finished in 155 ms --> 
  <p id="share"> <img src="layout/32x32xshare.png.pagespeed.ic.i3iroHP5fI.png" width="32" height="32" /> <a id="share-link" href="http://www.integral-calculator.com/#expr=sin%28x%29" onclick="window.prompt(&quot;To copy this link to the clipboard, press Ctrl+C, Enter.&quot;, $(&quot;share-link&quot;).href); return false;">Direct link to this calculation (for sharing)</a> </p>
 </body>
</html>

此表达式中的表格为您提供结果,网站使用mathjax将其显示为

enter image description here

示例程序将是

import java.io.IOException;

import org.apache.commons.lang3.StringEscapeUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;


public class JsoupParser6 {

    public static void main(String[] args) {
        try {
            // Integral
            String url = "http://www.integral-calculator.com/int.php";
            String q = "{\"expression\":\"sin(4x) * e^(-x)\",\"intVar\":\"x\",\"upperBound\":\"\",\"lowerBound\":\"\",\"simplifyExpressions\":false,\"latex\":\"\\\\displaystyle\\\\int\\\\limits^{}_{}{\\\\sin\\\\left(4x\\\\right){\\\\cdot}{\\\\mathrm{e}}^{-x}\\\\, \\\\mathrm{d}x}\"}";
            Document integralDoc = Jsoup.connect(url).data("q", q).data("v", "1380119311").post();
            System.out.println(integralDoc);
            System.out.println("\n*******************************\n");

            //Differential
            url = "http://www.derivative-calculator.net/diff.php";
            q = "{\"expression\":\"sin(x)\",\"diffVar\":\"x\",\"diffOrder\":1,\"simplifyExpressions\":false,\"showSteps\":false,\"latex\":\"\\\\dfrac{\\\\mathrm{d}}{\\\\mathrm{d}x}\\\\left(\\\\sin\\\\left(x\\\\right)\\\\right)\"}";
            Document differentialDoc = Jsoup.connect(url).data("q", q).data("v", "1380119305").post();
            System.out.println(differentialDoc);
            System.out.println("\n*******************************\n");

            //Calculus
            url = "http://calculus-calculator.com/calculation/integrate.php";
            Document calculusDoc = Jsoup.connect(url).data("expression", "sin(x)").data("intvar", "x").post();
            String outStr =     StringEscapeUtils.unescapeJava(calculusDoc.toString());
            Document formattedOutPut = Jsoup.parse(outStr);
            formattedOutPut.body().html(formattedOutPut.select("div.isteps").toString());
            System.out.println(formattedOutPut);
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

}

根据评论进行更新。

unescape完美运作。在MathJax中,您可以右键单击并查看命令。因此,如果你去你的网站http://calculus-calculator.com/并尝试sin(x)方程式并右键点击结果并查看TexCommand,如

enter image description here

你可以看到命令正是我们在unsescape之后获得的命令。演示站点没有呈现它。可能是演示站点的限制,这就是全部。

enter image description here

答案 1 :(得分:0)

public static String do_conversion (String str)
{
char c;
String output = "{";

for(int i = 0; i < str.length(); i++)
{
c = str.charAt(i);

if(c=='e')
output += "{mathrm{e}}";

else if(c=='(')
output += '{';

else if(c==')')
output += '}';

else if(c=='+')
output += "{cplus}";

else if(c=='-')
output += "{cminus}";

else if(c=='*')
output += "{cdot}";

else if(c=='/')
output += "{cdivide}";

else output += c; // else copy the character normally
}

output += ", mathrm{d}x}";
return output;
}

@Syam S