Question

我正在编写一个从URL到URI的转换，用于只接受java.net.URI作为参数的Http方法。

我的实现是这样的：

  new URI(url.getProtocol(), url.getAuthority(), url.getPath(), url.getQuery(), null);

因此，它不会破坏具有空间的URL（以格式错误的URL为借口）。但是，在编码以下网址时：

http://www.****.ca/en-ca/Catalog/Gallery.aspx?ID=Mass%20Spectrometry%20[GC/MS%20and%20ICP-MS]&PID=Gas%20Chromatography%20Mass%20Spectrometry%20Consumables

它将所有％20转换为％2520，这会导致无效的地址。

Java中有没有办法正确解析各种URL？包括同时包含％20和空格的那些？像浏览器或wget命令一样。

Answer 1

这是我自己的解决方案，到目前为止工作但我不知道它是否会破坏另一个奇怪的URI字符串：

  public static URI uri(String s) throws URISyntaxException {
    try {
      return new URI(s);
    }
    catch (URISyntaxException e) {
      try {
        URL url = new URL(s);
        return new URI(url.getProtocol(), url.getAuthority(), url.getPath(), url.getQuery(), null);
      } catch (MalformedURLException ee) {
        URL url;
        try {
          url = new URL(dummyURL, s);
        } catch (MalformedURLException eee) {
          throw new RuntimeException(eee);
        }
        return new URI(null, null, url.getPath(), url.getQuery(), null); //this will generate a relative URI the string itself is relative
      }
    }
  }

如何避免java.net.URI中的冗余双重编码？

1 个答案: