Question

我正在使用Webharvest从网站下载文件并使用其原始名称。

我正在使用的Java代码是：

import org.apache.commons.httpclient.Header;
            import org.apache.commons.httpclient.HttpClient;
            import org.apache.commons.httpclient.HttpStatus;
            import org.apache.commons.httpclient.Header;
            import org.apache.commons.httpclient.methods.GetMethod; 

            HttpClient client = new HttpClient();

            BufferedReader br = null;
            StringBuffer result = new StringBuffer();
            String attachName;

            GetMethod method = new GetMethod(attachmentLink.toString());

            int returnCode; 
            returnCode = client.executeMethod(method);
            Header[] headers = method.getResponseHeader("Content-Disposition");
            attachName = headers[0].getValue();
            attachName = new String(attachName.getBytes());

webharvest的结果是：

附件; filename =“Resoluci nsobreMesasdeContrataci n.pdf”

我不能接受这封信

ó

在我将标题Content-Disposition的值变为变量attachName后，我也尝试解码它，但没有运气：

String attachNamef = URLEncoder.encode(attachName, "ISO-8859-1"); 
                      attachNamef = URLEncoder.decode(attachNamef, "UTF-8");

我能够确定响应字符集是：ISO-8859-1

method.getResponseCharSet()

P.S。当我在Firefox Firebug中看到标题时 - 值正常：内容处理

附件; filename =“ResoluciónsobreMesasdeContratación.pdf”

Answer 1

Apache HttpClient不支持HTTP头中的非ascii字符。 Taken from documentation：

HTTP请求或响应的标头必须采用US-ASCII格式。在请求或响应的标头中不能使用非US-ASCII字符。通常，这不是问题，因为HTTP标头旨在促进数据传输，而不是实际传输数据本身。但有一个例外是cookies。由于cookie被转换为HTTP标头，因此它们被限制为US-ASCII字符集。有关详细信息，请参阅Cookie指南。

使用Java获取响应标头，编码问题

1 个答案: