Call me prehistoric but I'm trying to use XHTML document type encoded in a UTF8 html page with a PRE tag containing text with some unicode linebreaks u2028.
Firefox at least seems not to honor u2028 as a linebreak in a PRE block. Changing the character to u000D or u000a seems to produce the linebreaks I'm expecting. (Technically the u2028 is encoded in UTF8 as a 3 byte sequence but I assume it is normalized back when it read). I haven't tested this with other browsers yet.
I tried digging through the W3C docs on HTML but was not able to figure out from the section on PRE just exactly what characters are treated as linebreaks. Where is chapter and verse on exactly what is interpreted as linebreaks in PRE? Is u2028 treated as such, with Firefox being defective, or the HTML standard brain dead in not interpreting u2028 as a line break when found in a Unicode file?
It seems pretty weird to me that a text (e.g. source code) file containing unicode would not use u2028 as a standard for line breaks (I actually have a code generator that produces source code like this, and I'm trying to display that code in an HTML page). Thus placing such code straight into PRE blocks i would think would produce the behavior I expect.
答案 0 :(得分:3)
尽管PRE元素的性质可能暗示,但它的渲染行为实际上是在CSS中指定的,而不是在HTML中指定的,因为它与空白渲染有关。
CSS2表示U + 000D和U + 000A计为换行符,用户代理可以识别并标准化其他Unicode字符。但是,它并没有在任何地方提到U + 2028。
css-text-3更全面地涵盖了空白和换行处理。它将术语segment break定义如下:
对于CSS处理,文本中的每个文档语言定义的分段符号,CRLF序列(U + 000D U + 000A),回车符(U + 000D)和换行符(U + 000A)被视为< strong> 段中断 ,然后根据
white-space
属性的指定进行渲染解释。
与CSS2一样,它没有提到U + 2028。
但是,in a later section,它确实提到了强制中断字符(其中U + 2028是一个):
确定换行符时:
- 无论
white-space
值如何,行总是在每个保留的强制中断字符处中断:对于所有值,为BK,CR,LF,CM,NL和SG断行类定义的换行行为[UAX14]必须得到尊重。
请注意,它甚至表示&#34;无论white-space
值&#34 ;;这意味着即使在PRE元素之外,U+2028 must introduce a line break(与BR元素类似)!
对于实现,Internet Explorer和Microsoft Edge似乎是唯一一个将U + 2028呈现为PRE元素中的换行符且默认为white-space: pre
的浏览器。唯一需要注意的是,它们将其标准化为U + 000A,因此最终被视为PRE元素之外的常规空格(或white-space: pre
/ pre-line
)。这与css-text-3关于保留强制中断的内容相符,但我不确定将U + 2028规范化为U + 000A本身的行为是否可接受,或者是Unicode / CSS规范违规。
Windows 10上的Chrome始终打印标记为LSEP的符号,Firefox始终打印零宽度字符。
文档是application / xhtml + xml还是text / html似乎没有任何区别。