我目前正在尝试使用scala和eclipse从某些网页中删除一些数据,我的问题是,当我在浏览器中查看页面的来源时,使用scala的xml读取内容似乎非常简单包:
<!doctype html>
<html lang="de">
<head>
<meta charset="utf-8">
<title>some text</title>
<meta name="keywords" content="some text" />
<meta name="description" content="some text" />
<meta name="robots" content="noodp"/>
<meta name="page-topic" content="some text" />
<meta http-equiv="x-ua-compatible" content="ie=edge"/>
...
但是当我的小程序尝试使用相同的链接访问该页面以阅读内容时,它会读取以下内容:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN" "http://www.wapforum.org/DTD/xhtml-mobile10.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de">
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1" />
<title>some text</title>
<link rel="shortcut icon" type="image/ico" href="/favicon.ico" />
<link href="/res/im.min.css" media="all, handheld" rel="stylesheet" type="text/css" />
</head>
...
我收到以下错误:
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: http://www.wapforum.org/DTD/xhtml-mobile10.dtd
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1625)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:633)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1271)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1238)
at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:260)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1153)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1049)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:962)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:607)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:489)
...
那么为什么我只能访问我可以在浏览器中查看的其他(移动?)版本的页面?为什么我会收到这样的错误消息?
由于