我正在尝试从URL连接中读取HTML代码。在一个案例中,我正在尝试阅读的html文件在实际的doc类型声明之前包含5个换行符。在这种情况下,输入阅读器会抛出EOF异常。
URL pageUrl =
new URL(
"http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html"
);
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
DataInputStream dis = new DataInputStream(getConn.getInputStream());
//some read method here
有没有人遇到这样的问题?
URL pageUrl = new URL("http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html");
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
DataInputStream dis = new DataInputStream(getConn.getInputStream());
String urlData = "";
while ((urlData = dis.readUTF()) != null)
System.out.println(urlData);
//抛出异常
java.io.EOFException的 at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323) 在java.io.DataInputStream.readUTF(DataInputStream.java:572) 在java.io.DataInputStream.readUTF(DataInputStream.java:547)
在bufferedreader的情况下,它只响应null并且不继续
pageUrl = new URL("http://www.nytimes.com/2011/03/15/sports/basketball/15nbaround.html");
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
BufferedReader br = new BufferedReader(new InputStreamReader(getConn.getInputStream()));
String urlData = "";
while(true)
urlData = br.readLine();
System.out.println(urlData);
输出null
答案 0 :(得分:1)
您正在使用DataInputStream
来读取未使用DataOutputStream
编码的数据。检查您DataInputStream#readUtf()
号召唤的记录行为; it first reads two bytes形成一个16位整数,表示包含UTF编码字符串的后续字节数。您从HTTP服务器读取的数据不是以这种格式编码的。
相反,HTTP服务器按照RFC 2616部分6.1和2.2发送以ASCII编码的标头。您需要将标题读作文本,然后确定邮件正文(“实体”)的编码方式。
答案 1 :(得分:1)
这很好用:
package url;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;
/**
* UrlReader
* @author Michael
* @since 3/20/11
*/
public class UrlReader
{
public static void main(String[] args)
{
UrlReader urlReader = new UrlReader();
for (String url : args)
{
try
{
String contents = urlReader.readContents(url);
System.out.printf("url: %s contents: %s\n", url, contents);
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
public String readContents(String address) throws IOException
{
StringBuilder contents = new StringBuilder(2048);
BufferedReader br = null;
try
{
URL url = new URL(address);
br = new BufferedReader(new InputStreamReader(url.openStream()));
String line = "";
while (line != null)
{
line = br.readLine();
contents.append(line);
}
}
finally
{
close(br);
}
return contents.toString();
}
private static void close(Reader br)
{
try
{
if (br != null)
{
br.close();
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
答案 2 :(得分:0)
此:
public class Main {
public static void main(String[] args)
throws MalformedURLException, IOException
{
URL pageUrl = new URL("http://www.google.com");
URLConnection getConn = pageUrl.openConnection();
getConn.connect();
BufferedReader dis = new BufferedReader(
new InputStreamReader(
getConn.getInputStream()));
String myString;
while ((myString = dis.readLine()) != null)
{
System.out.println(myString);
}
}
}
完美无缺。但是,您提供的URL不会返回任何内容。