Nokogiri:用```和```<pre>`

时间:2018-05-06 17:03:09

标签: ruby nokogiri

A certain downstream sink (that I don't control), being fed w/ html, ecstatically trims down all spaces & newlines into 1 space. Thus I need to 'escape' such chars inside all superview tags.

I've managed to 'protect' newlines by replacing them w/ import UIKit class CommentsLayout: UICollectionViewFlowLayout { override func layoutAttributesForItem(at indexPath: IndexPath) -> UICollectionViewLayoutAttributes? { let layoutAttribute = super.layoutAttributesForItem(at: indexPath)?.copy() as! UICollectionViewLayoutAttributes if indexPath.section == 1 { print(collectionView?.frame.size.height) layoutAttribute.frame = CGRect(x: 0, y: 600, width: collectionViewContentSize.width, height: 40) } return layoutAttribute } , but the spaces issue stupefies me:

<pre>

still yields in:

<br>

(instead of expected require 'nokogiri' doc = Nokogiri::HTML.fragment "<pre><code> 1\n\n 2</code></pre>" doc.css('pre').each do |node| text = node.to_s.gsub(/\n/, '<br>').gsub(/\s/) { '&nbsp;' } node.replace Nokogiri::HTML.fragment text end puts doc )

i.e., Nokogiri re-encodes <pre><code> 1<br><br> 2</code></pre> to spaces again!

Now, here's what also interesting: those are not actual 'spaces', but (I gather) UTF8 non breaking space chars:

<pre><code>&nbsp;&nbsp;1<br><br>&nbsp;&nbsp;2</code></pre>

notice &nbsp; ($ ruby 1.rb | hexdump -C 00000000 3c 70 72 65 3e 3c 63 6f 64 65 3e c2 a0 c2 a0 31 |<pre><code>....1| 00000010 3c 62 72 3e 3c 62 72 3e c2 a0 c2 a0 32 3c 2f 63 |<br><br>....2</c| 00000020 6f 64 65 3e 3c 2f 70 72 65 3e 0a |ode></pre>.| 0000002b ) instead of more common c2 a0 c2 a0 31.

I don't think its prudent to blindly enforce non-UTF8 encoding on my output, so playing w/ various encodings isn't an option.

0 个答案:

没有答案