JSoup如何选择并获取具体信息

时间:2014-02-18 02:04:27

标签: android html regex jsoup

我正在尝试选择并获取特定html信息的变量。但首先,我试图尽可能地展示这些信息。问题是我想要提取信息的页面没有明确的类标识符,或者我很难看到如何提取该信息。

这是我的Jsoup代码:

    public class MainActivity extends Activity {

    private TextView tvmaximo;
    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        tvmaximo=(TextView)findViewById(R.id.tvmaximo);

        new BackGroundTask().execute();

    }



    @Override
    public boolean onCreateOptionsMenu(Menu menu) {
        // Inflate the menu; this adds items to the action bar if it is present.
        getMenuInflater().inflate(R.menu.main, menu);
        return true;
    }



    class BackGroundTask extends AsyncTask<Void, Void, String> {

        @Override
        protected void onPreExecute() {
            super.onPreExecute();
        }

        @Override
        protected String doInBackground(Void... params) {

            try {
               URL url= new URL("http://www.myweb.com");
               /*Document doc = Jsoup.connect(url.toString()).get();*/
               Document doc = Jsoup.connect(url.toString()).get();
               /*Elements elements = doc.select(".lyrics").first();*/

               //get page title
               /*String title = doc.title();*/

               Elements elements = doc.select("td.headerRouteText"); 


               String maximo=elements.html(); 

                return maximo; 


            } catch (IOException e) {
                e.printStackTrace();
            }

            return null;
        }

        @Override
        protected void onPostExecute(String result) {

            tvmaximo.setText(result);
            System.out.println(result);
            super.onPostExecute(result);
        }




    }


}

这里是我要保存的html:“MÍNIMO”,“MÁXIMO”,“VALOR MEDIO”以及89,99,47,341和17,3。每个值都在一个不同的变量中。总共6个变量:

<tr>
   <td align="center">
      <table>
      <tr><td align="center" class="cabeceraRutaTexto" colspan="2">MÁXIMO </td>
      <td align="center" class="cabeceraRutaTexto" colspan="2">VALOR MEDIO </td>
      <td class="cabeceraRutaTexto" align="center">MÍNIMO </td>
 </tr>    
 <tr><td align="center" class="cabeceraRutaTexto">89,99 <img        
      SRC="../Diseno/imagenes/euro.gif" WIDTH="7" HEIGHT="8"> /MWh</td>
     <td>&nbsp;&nbsp;</td>
     <td align="center" class="cabeceraRutaTexto">47,341 <img 
           SRC="../Diseno/imagenes/euro.gif" WIDTH="7" HEIGHT="8"> /MWh</td>
     <td>&nbsp;&nbsp;</td>
     <td align="center" class="cabeceraRutaTexto">17,3 <img           
          SRC="../Diseno/imagenes/euro.gif" WIDTH="7" HEIGHT="8"> /MWh</td>
 </tr>
    </table>   
   </td>
 </tr>
</table> 

正如您所看到的,创建正则表达式行很困难,因为缺少对的引用 建立它。我怎么能用Jsoup做到这一点?提前感谢您的帮助和时间。

现在,感谢您的所有解释,我已经解决了问题:

public class MainActivity extends Activity {

private TextView tvmaximo;
    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        tvmaximo=(TextView)findViewById(R.id.tvmaximo);

        new BackGroundTask().execute();

    }



    @Override
    public boolean onCreateOptionsMenu(Menu menu) {
        // Inflate the menu; this adds items to the action bar if it is present.
        getMenuInflater().inflate(R.menu.main, menu);
        return true;
    }



    class BackGroundTask extends AsyncTask<Void, Void, String> {

        @Override
        protected void onPreExecute() {
            super.onPreExecute();
        }

        @Override
        public String doInBackground(Void... params) {


            try {

               URL url= new URL("http://www.myweb.com");
               Document doc = Jsoup.connect(url.toString()).get();

                       /get elements table 1 (graphic table) */
                       Elements elementsgraphic = doc.select("div.divns6");
                       elementsgraphic.size(); //2

                       /* get elements table 2 (normal table) */
               Elements elements = doc.select("td.cabeceraRutaTexto");
               elements.size(); // 6

                  String barra1= elementsgraphic.get(0).text();       

               /* text values from table 2 */
                  String titulotxt = elements.get(0).text(); // TÍTULO
              String maximotxt = elements.get(1).text(); // TEXTO VALOR MAXIMO
              String mediotxt = elements.get(2).text(); // TEXTO VALOR MEDIO
              String minimotxt = elements.get(3).text(); // TEXTO VALOR MINIMO

           /* numeric values from table 2 */

                  String maximo = elements.get(4).text(); // NUMERICO VALOR MAXIMO
                  String medio = elements.get(5).text(); // NUMERICO VALOR MEDIO
                  String minimo = elements.get(6).text(); // NUMERICO VALOR MINIMO

                   return maximo;





            } catch (IOException e) {
                e.printStackTrace();
            }

            return null;
        }

        @Override
        protected void onPostExecute(String result) {

            tvmaximo.setText(result);

            /*System.out.println(result);*/
            super.onPostExecute(result);
        }




    }


}
  1. 现在我还有两个问题,如何在下一个代码中从“activadiv”中取出“18”和“8”值?我正在尝试:

    元素elementsgraphic = doc.select(“div.divns6”); 元素elementsgraphic = doc.select(“div#divns6”); 元素elementsgraphic = doc.select(“changeImage('barra1”);

  2. 等......应用程序每次尝试都会崩溃。我知道错误是正则表达式。

    另一方面..我一直在尝试使用数组将所有这些变量返回到“OnPostExecute”但程序绑定错误,因为Asynctask不会让我返回一个数组。再次感谢您的耐心等待。

     <table>
    
       <tr><td><div name='divns6' id='divns6' style='position:relative;visibility:hidden;'       
            width='400' height='160'><table valign=botton cellpadding='0' cellspacing='0' 
            border='0'><tr     valign='bottom'>
        <td width=15 valign="bottom" height=150><a href="javascript:void(null)" 
                 onMouseOver="changeImage('barra1','','47',2);activadiv('barra0','18');" 
                 onMouseOut="changeImage('barra1','','47',0);desactivadiv('barra1');"><img 
                 NAME="barra1" width="11px" height="47" border="0"></a></td>
    
        <td width=15 valign="bottom" height=150><a href="javascript:void(null)" 
                 onMouseOver="changeImage('barra2','','21',2);activadiv('barra1','8');" 
                 onMouseOut="changeImage('barra2','','21',0);desactivadiv('barra2');"><img 
                 NAME="barra2" width="11px" height="21" border="0"></a></td>
    
          </td></tr></table>
    

2 个答案:

答案 0 :(得分:1)

选择它们并按位置提取,例如:

Elements elements = doc.select("td.cabeceraRutaTexto");
elements.size(); // 6
elements.get(0).text(); // MÁXIMO
elements.get(1).text(); // VALOR MEDIO
...
// or just the one you want
doc.select("td:eq(0).cabeceraRutaTexto").get(0).text() // MÁXIMO

从评论中更新:从给定的html获取18作为javascript代码的一部分具有另一级别的复杂性,以下代码将提供所需的值,但请记住有更好的方法解析并提取部分javascript。

Document doc = Jsoup.parse(xml);
String onMouseOver = doc.select("a").attr("onMouseOver");
// while this will work, there are more robust ways to parse javascript
onMouseOver.split("'")[9];

答案 1 :(得分:1)

Elements elem = doc.select("td.cabeceraRutaTexto");

for(Element el : elem)
{
Log.e("elements :" , el.text());
}

or 

for(int i = 0;i<elem.size();i++)
{
Element el = elem.first();
Log.e("element" + i + ":",el.text()); 
}