在java swing中使用jsoup进行网页抓取的分页

时间:2018-02-07 19:18:21

标签: java swing pagination jsoup

private void EducationWorld_Webscrap_jButtonActionPerformed(java.awt.event.ActionEvent evt)
{                                                                
     try
     {
         Document doc=Jsoup.connect("http://www.educationworld.in/institution/mumbai/schools").userAgent("Mozilla/17.0").get();
         Elements  links=doc.select("div.instnm.litblue_bg");
         StringBuilder sb1 = new StringBuilder ();
         links.stream().forEach(e->sb1.append(e.text()).append(System.getProperty("line.separator")));
         jTextArea1.setText(sb1.toString());
     }
     catch(Exception e)
     {
         JOptionPane.showMessageDialog(null, e);
     }
} 

这是显示数据。但是有分页。如何获取下五页的数据?

1 个答案:

答案 0 :(得分:2)

幸运的是,我已经实现了你所追求的目标,正如你在下面的代码块中看到的那样。如果你不确定发生了什么,我已经添加了评论,希望能够描述每一步。

我尝试使用网站的分页设置,但它们似乎只允许每个请求增加5个结果,因此没有太多余地,您需要先通过起点才能检索下5个结果。

因此,我必须将其包含在循环fori次的32中。相当于158学校,除以5等于31.6或四舍五入32当然,如果您只想要第一个5页面,则可以更改循环仅循环5次。

无论如何都要多汁一点;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;

import java.io.*;
import java.net.*;

public class Loop
{
    public static void main( String[] args )
    {
        final StringBuilder sb1 = new StringBuilder();
        BufferedReader bufferedReader = null;
        OutputStream outputStream = null;

        try
        {
            // Parameter pagination counts
            int startCount = 0;
            int limitCount = 5;

            // Loop 32 times, 158 schools / 5 (pagination amount)
            for( int i = 0; i < 32; i++ )
            {
                // Open a connection to the supplied URL
                final URLConnection urlConnection = new URL( "http://www.educationworld.in/institution/mumbai/schools" ).openConnection();
                // Tell the URL we are sending output
                urlConnection.setDoOutput( true );
                // The stream we will be writing to the URL
                outputStream = urlConnection.getOutputStream();

                // Setup parameters for pagination
                final String params = "qstart=" + startCount + "&limit=" + limitCount;
                // Get the bytes of the pagination parameters
                final byte[] outputInBytes = params.getBytes( "UTF-8" );
                // Write the bytes to the URL
                outputStream.write( outputInBytes );

                // Get and read the URL response
                bufferedReader = new BufferedReader( new InputStreamReader( urlConnection.getInputStream() ) );
                StringBuilder response = new StringBuilder();
                String inputLine;

                // Loop over the response and read each line appending it to the StringBuilder
                while( (inputLine = bufferedReader.readLine()) != null )
                {
                    response.append( inputLine );
                }

                // Do the same as before just with a String instead
                final Document doc = Jsoup.parse( response.toString() );
                Elements links = doc.select( "div.instnm.litblue_bg" );
                links.forEach( e -> sb1.append( e.text() ).append( System.getProperty( "line.separator" ) ) );

                // Increment the pagination parameters
                startCount += 5;
                limitCount += 5;
            }

            System.out.println( sb1.toString() );
            jTextArea1.setText(sb1.toString());
        }
        catch( Exception e )
        {
            e.printStackTrace();
        }
        finally
        {
            try
            {
                // Close the bufferedReader
                if( bufferedReader != null )
                {
                    bufferedReader.close();
                }

                // Close the outputStream
                if( outputStream != null )
                {
                    outputStream.close();
                }
            }
            catch( IOException e )
            {
                e.printStackTrace();
            }
        }
    }
}

希望这有助于您获得所需的结果,如果您需要任何描述,只需要询问!

相关问题