是"&" URL的PATH段中允许使用的符号?

时间:2015-09-11 10:52:51

标签: java

"&" URL的PATH段中允许使用的符号,还是应该转义?

根据nu w3c验证器(https://validator.w3.org/nu/) 我得到了:

Error: & did not start a character reference. (& probably should have been escaped as &.)
At line 407, column 52
<a href="/Bags-&-Purses/c/wome

但是,如果我尝试通过Java URI类对URL进行编码,我会得到所有空格等编码但不是&amp;符号

URI u = new URI(request.getScheme(), null,
                            request.getServerName(), request.getServerPort(),
                            request.getContextPath() + url,
                            query, null);
u.toURL().toString();

其中url字符串是:/ Bags-&amp; -Purses / c / womens-accessories-bags

结果是:https://localhost:8112/storefront/Bags-&-Purses/c/womens-accessories-bags - 未编码

问题是为什么&amp;没有逃脱..这是有效的吗? 我猜它应该用%26进行转义,但它看起来并没有被转义。

1 个答案:

答案 0 :(得分:1)

&amp;,而保留字符似乎是URI中路径段的有效字符。如果你看一下RFC3986, section 3.3中路径段的语法,&amp;被允许作为sub-delims组的一部分:

  path          = path-abempty    ; begins with "/" or is empty
                / path-absolute   ; begins with "/" but not "//"
                / path-noscheme   ; begins with a non-colon segment
                / path-rootless   ; begins with a segment
                / path-empty      ; zero characters

  path-abempty  = *( "/" segment )
  path-absolute = "/" [ segment-nz *( "/" segment ) ]
  path-noscheme = segment-nz-nc *( "/" segment )
  path-rootless = segment-nz *( "/" segment )
  path-empty    = 0<pchar>

  segment       = *pchar
  segment-nz    = 1*pchar
  segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
                ; non-zero-length segment without any colon ":"

  pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

(...)

  reserved    = gen-delims / sub-delims

  gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

  sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                / "*" / "+" / "," / ";" / "="

当您询问网址而不是更一般的URI时,据我所知,URL不会对路径段造成额外限制。然后,同一RFC的Section 2.2继续指出保留字符应该是百分比编码的,除非它们在该组件中被特别允许。但是对于这种情况,根据上面的语法,子路径组(&amp; included)中的所有字符似乎都在路径段中被特别允许。

但是,您在此处遇到的问题与URL本身无关,而是与HTML文档中包含的文本表示有关。 &符号不能单独显示在HTML中,必须始终进行编码。相关问题:Do I really need to encode '&' as '&amp;'?