W3C: Can't read EBNF's SPARQL IRIREF specification?

时间:2016-04-25 09:38:13

标签: regex sparql w3c bnf ebnf

(Specifications: https://www.w3.org/TR/sparql11-query/#rIRIREF)

According to the specification, an IRIREF can be parsed as this:

[139]   IRIREF    ::=   '<' ([^<>"{}|^`\]-[#x00-#x20])* '>'

What is bothering me is this part of the expression:

\]-[

If I consider \ to be an escaping character in the bracketed character class (which would be the case in a Perl regular expression), then it means the \ alone is not a problem in the IRIREF and this is valid: <http://hello\world>

Then there is this big problem with the range: ]-[. The character ] has an ordinal value of 93 and the [ of 91. This means we have an invalid range: 93 to 92. This is not allowed in most regex engines I tested.

What does that means?

  1. Should I consider the - as a regular character in the bracketed character class, then this is invalid IRIREF: <http://new-example.org>. It makes no sense.
  2. Should I consider the range ]-[ null and this IRIREF is valid: <http://hello[world]>
  3. What I think is more likely is that the range is inverted and is not a problem for w3c specifications, which means the characters [, \ and ] are invalid characters. This makes sense.

2 个答案:

答案 0 :(得分:2)

这是简写的句法糖,EBNF确切地说,是一种超越regexen标准功能的语法:

这意味着prior character class without following character class,在此特定情况下为not certain brackets and quotes, and neither control codes from 0x00 (NUL) to 0x20 (SPC), which would otherwise be included

相关参考:EBNF notation used,特别是A - B条款。它在SPARQL grammar的第一段中提及。

答案 1 :(得分:1)

SPARQL规范说它的语法是用符号defined by the XML 1.1 specification编写的。

在那个符号中,你引用右边的那个,

'<' ([^<>"{}|^`\]-[#x00-#x20])* '>'

表示

的序列
  • a&#39;&lt;&#39;字符
  • 与表达式匹配的零个或多个字符[^&lt;&gt;&#34; {} | ^`] - [#x00-#x20];这是表示

    的设定差异
    • 由[^&lt;&gt;&#34; {} | ^ \] = any character other than '<', '>', '"', '{', '}', '|', '^', '&#39;或&#39; \&#39;匹配的任何字符;注: &#39; \&#39;这个符号中没有转义字符(根本没有转义字符)
    • 除了那些匹配[#x00-#x20] =控制字符的C1区域加空白

    编写此模式有点奇怪;它同样可以写成[^&lt;&gt;&#34; {} | ^`#x00-#x20];我不确定为什么编辑会按照他们的方式写出来。

  • a&#39;&gt;&#39;字符

所以逐一回答你的问题:

  

我应该考虑 - 作为括号中的字符类中的常规字符,那么这是无效的IRIREF:http://new-example.org。这毫无意义。

没有。当 A B 是此表示法中的表达式时, A - B 表示 A ,也不是 B 语言的字符串。这里 A B 是每个字符类表达式,一个是负数,一个是正数。

你是对的,禁止用于接受尖括号括号内的IRI的语法规则中的连字符是没有意义的。

  

我应该考虑范围] - [null且此IRIREF有效:http://hello[world]

&#39;] - [&#39;不表示此处的范围,null或其他; ]结束第一个字符类表达式,[结束第二个字符表。

  

我认为更有可能的是,范围是反转的,对于w3c规范来说不是问题,这意味着字符[,\和]是无效字符。这是有道理的。

如果我对表达式的解析是正确的,&#39; [&#39;和&#39;]&#39;是合法的(第一个表达不排除它们,第二个表达不排除它们); &#39; \&#39;被第一个表达式排除。