在构建CLI Google Card查看器时,我偶然发现了渲染 HTML在命令行中的问题,例如浏览器 w3m 或 lynx 。我最接近的是使用Nokogiri的文本吐出:
Nokogiri::HTML::parse(card_snippet).text
但它打印出来如下:
"Albert EinsteinTheoretical PhysicistAlbert Einstein was a German-born theoretical physicist. He developed the general theory of relativity, one of the two pillars of modern physics. Einstein's work is also known for its influence on the philosophy of science. WikipediaBorn: March 14, 1879, Ulm, GermanyDied: April 18, 1955, Princeton, New Jersey, United StatesInfluenced: Satyendra Nath Bose, Wolfgang Pauli, Leo Szilard, moreInfluenced by: Isaac Newton, Mahatma Gandhi, moreBooksThe World as I See It1949Relativity: The Special a...1916Ideas and Opinions2000Out of My Later Years2006The Meaning of Relativity1922The Evolution of Physics1938People also search forIsaac NewtonEduard EinsteinSonStephen HawkingElsa EinsteinSpouseMileva MarićFormer spouseThomas Edison"
但是使用lynx:
cat card_snippet.html | lyx -dump -stdin
Albert Einstein
Theoretical Physicist
Albert Einstein was a German-born theoretical physicist. He
developed the general theory of relativity, one of the two pillars
of modern physics. Einstein's work is also known for its influence
on the philosophy of science. Wikipedia
Born: March 14, 1879, Ulm, Germany
Died: April 18, 1955, Princeton, New Jersey, United States
Influenced: Satyendra Nath Bose, Wolfgang Pauli, Leo
Szilard,
注意:剥掉一些噪音后。但是,行结束是正确的。
Ruby中类似解决方案的任何想法? html代码段:Pastebin Link。
答案 0 :(得分:0)
这对我有用,
require 'nokogiri'
html = `curl http://pastebin.com/raw/pYKwACBp`
doc = Nokogiri::HTML(html)
puts doc.text.gsub(/[\r\n]+/,"\n").strip