我使用ScrapingUtils
来解析一些URL。为此,我使用以下代码:
String link = "Here the link";
Document doc = ScrapingUtils.visit(link, false);
if (doc != null) {
//code
} else {
//code
}
问题在于有时它无法接收来自客户端的HTML,并且无法获取数据。我尝试使用try..catch
,以便在发生读取超时错误时,可以为变量指定特定的值,以知道存在错误。
我已经尝试过:
String link = "Here the link";
Document doc = ScrapingUtils.visit(link, false);
try {
if (doc != null) {
//code
} else {
//code
}
catch (TimeoutException exception) {
throw new TimeoutException("Timeout exceeded: " + timeout + unit);
}
但是使用TimeoutException
异常语句时出现错误:
TimeoutException异常不会在相应的try语句的正文中引发
我知道Java知道此异常是没有意义的,因为它永远不会发生。
ScrapingUtils类:
public class ScrapingUtils {
private static final Logger logger = LoggerFactory.getLogger(ScrapingUtils.class);
public static Document visit(String urlStr, boolean useProxy) {
Document doc = null;
try {
if (!useProxy) {
logger.info("Downloading " + urlStr);
doc = Jsoup.connect(urlStr).userAgent("Mozilla/5.0").maxBodySize(0).timeout(Config.CONNECTION_TIMEOUT).get();
} else {
logger.info("downloading " + urlStr);
URL url = new URL(urlStr);
String[] proxyStr = NetUtils.getProxy().split(":");
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(proxyStr[0], Integer.parseInt(proxyStr[1])));
HttpURLConnection conn = (HttpURLConnection) url.openConnection(proxy);
conn.setConnectTimeout(Config.CONNECTION_TIMEOUT);
conn.connect();
BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream()));
StringBuilder buffer = new StringBuilder();
String str;
while((str = br.readLine()) != null) {
buffer.append(str);
}
doc = Jsoup.parse(buffer.toString());
}
} catch (IOException ex) {
logger.error("Error downloading website " + urlStr + "\n" + ex.getMessage());
}
return doc;
}
public static Document visit(String urlStr) {
return visit(urlStr, false);
}
}
答案 0 :(得分:0)
好的。到目前为止,您将永远不会在代码中得到TimeOutException
。但是您将在此行中得到SocketTimeoutException
doc = Jsoup.connect(urlStr).userAgent("Mozilla/5.0").maxBodySize(0).timeout(Config.CONNECTION_TIMEOUT).get();
和
conn.connect();
到目前为止,您可以在这里像这样处理异常
try {
if (!useProxy) {
Jsoup.connect("https://docs.oracle.com").userAgent("Mozilla/5.0").maxBodySize(0).timeout(1000).get();
} else {
URL url = new URL("https://docs.oracle.com");
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress("", 11));
HttpURLConnection conn = (HttpURLConnection) url.openConnection(proxy);
conn.setConnectTimeout(1000);
conn.connect();
BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream()));
StringBuilder buffer = new StringBuilder();
String str;
while ((str = br.readLine()) != null) {
buffer.append(str);
}
}
} catch (SocketTimeoutException a) {
System.out.println("log");
} catch (IOException ex) {
}
我修改了代码以便自己工作并获取SocketTimeOut。而且,如果您想始终捕获ScoketTimeOutException,仅抛出:
catch (SocketTimeoutException a) {
System.out.println("log");
throw new SocketTimeoutException();
}
使用此方法,您将强制该方法位于try / catch或方法签名之外的异常中
try {
visit("test", true);
} catch (SocketTimeoutException e) {
e.printStackTrace();
}