我无法通过ftp从这个html文件中获取html文本。我使用漂亮的汤通过http / https读取html文件,但由于某种原因我不能从ftp下载/读取。请帮忙!
这是网址。 a link
到目前为止,这是我的代码。
BufferedReader reader = null;
String total = "";
String line;
ur = "ftp://ftp.legis.state.tx.us/bills/832/billtext/html/house_resolutions/HR00001_HR00099/HR00014I.htm"
try {
URL url = new URL(ur);
URLConnection urlc = url.openConnection();
InputStream is = urlc.getInputStream(); // To download
reader = new BufferedReader(new InputStreamReader(is, "UTF-8"));
while ((line = reader.readLine()) != null)
total += reader.readLine();
} finally {
if (reader != null)
try { reader.close();
} catch (IOException logOrIgnore) {}
}
答案 0 :(得分:1)
此代码适用于我,Java 1.7.0_25。请注意,您存储了每两行中的一行,在while循环的条件和正文中调用reader.readLine()
。
public static void main(String[] args) throws MalformedURLException, IOException {
BufferedReader reader = null;
String total = "";
String line;
String ur = "ftp://ftp.legis.state.tx.us/bills/832/billtext/html/house_resolutions/HR00001_HR00099/HR00014I.htm";
try {
URL url = new URL(ur);
URLConnection urlc = url.openConnection();
InputStream is = urlc.getInputStream(); // To download
reader = new BufferedReader(new InputStreamReader(is, "UTF-8"));
while ((line = reader.readLine()) != null) {
total += line;
}
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException logOrIgnore) {
}
}
}
}
答案 1 :(得分:0)
首先认为这与错误的路径分辨率discussed here有关,但这没有帮助。
我不知道这里到底出了什么问题,但我只能在这个ftp-server和MacOS Java 1.6.0_33-b03-424上重现这个错误。我无法使用Java 1.7.0_25重现它。所以也许你检查一下Java更新。
或者您可以使用commons FTPClient来检索文件:
FTPClient client = new FTPClient();
client.connect("ftp.legis.state.tx.us");
client.enterLocalPassiveMode();
client.login("anonymous", "");
client.changeWorkingDirectory("bills/832/billtext/html/house_resolutions/HR00001_HR00099");
InputStream is = client.retrieveFileStream("HR00014I.htm");