private void EducationWorld_Webscrap_jButtonActionPerformed(java.awt.event.ActionEvent evt)
{
try
{
Document doc=Jsoup.connect("http://www.educationworld.in/institution/mumbai/schools").userAgent("Mozilla/17.0").get();
Elements links=doc.select("div.instnm.litblue_bg");
StringBuilder sb1 = new StringBuilder ();
links.stream().forEach(e->sb1.append(e.text()).append(System.getProperty("line.separator")));
jTextArea1.setText(sb1.toString());
}
catch(Exception e)
{
JOptionPane.showMessageDialog(null, e);
}
}
这是显示数据。但是有分页。如何获取下五页的数据?
答案 0 :(得分:2)
幸运的是,我已经实现了你所追求的目标,正如你在下面的代码块中看到的那样。如果你不确定发生了什么,我已经添加了评论,希望能够描述每一步。
我尝试使用网站的分页设置,但它们似乎只允许每个请求增加5个结果,因此没有太多余地,您需要先通过起点才能检索下5个结果。
因此,我必须将其包含在循环fori
次的32
中。相当于158
学校,除以5
等于31.6
或四舍五入32
当然,如果您只想要第一个5
页面,则可以更改循环仅循环5
次。
无论如何都要多汁一点;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
import java.io.*;
import java.net.*;
public class Loop
{
public static void main( String[] args )
{
final StringBuilder sb1 = new StringBuilder();
BufferedReader bufferedReader = null;
OutputStream outputStream = null;
try
{
// Parameter pagination counts
int startCount = 0;
int limitCount = 5;
// Loop 32 times, 158 schools / 5 (pagination amount)
for( int i = 0; i < 32; i++ )
{
// Open a connection to the supplied URL
final URLConnection urlConnection = new URL( "http://www.educationworld.in/institution/mumbai/schools" ).openConnection();
// Tell the URL we are sending output
urlConnection.setDoOutput( true );
// The stream we will be writing to the URL
outputStream = urlConnection.getOutputStream();
// Setup parameters for pagination
final String params = "qstart=" + startCount + "&limit=" + limitCount;
// Get the bytes of the pagination parameters
final byte[] outputInBytes = params.getBytes( "UTF-8" );
// Write the bytes to the URL
outputStream.write( outputInBytes );
// Get and read the URL response
bufferedReader = new BufferedReader( new InputStreamReader( urlConnection.getInputStream() ) );
StringBuilder response = new StringBuilder();
String inputLine;
// Loop over the response and read each line appending it to the StringBuilder
while( (inputLine = bufferedReader.readLine()) != null )
{
response.append( inputLine );
}
// Do the same as before just with a String instead
final Document doc = Jsoup.parse( response.toString() );
Elements links = doc.select( "div.instnm.litblue_bg" );
links.forEach( e -> sb1.append( e.text() ).append( System.getProperty( "line.separator" ) ) );
// Increment the pagination parameters
startCount += 5;
limitCount += 5;
}
System.out.println( sb1.toString() );
jTextArea1.setText(sb1.toString());
}
catch( Exception e )
{
e.printStackTrace();
}
finally
{
try
{
// Close the bufferedReader
if( bufferedReader != null )
{
bufferedReader.close();
}
// Close the outputStream
if( outputStream != null )
{
outputStream.close();
}
}
catch( IOException e )
{
e.printStackTrace();
}
}
}
}
希望这有助于您获得所需的结果,如果您需要任何描述,只需要询问!