https://github.com/mozilla/readability(readability.js用于创建网页的阅读视图)
如何在此测试网页上实现readability.js 问题是,readability.js删除了我要保留的该网站的元素,并保留了应删除的元素。我希望有一个人可以帮助我。谢谢!是否有任何有关如何使用readability.js的文档?
<html><head>
<title>Reader View shows only the browser in reader view</title>
<script src="https://raw.githack.com/mozilla/readability/master/Readability.js"></script>
</head>
<body>
Everything outside the main div tag vanishes in Reader View<br>
<img class="no-print" src="http://dummyimage.com/1024x100/000/ffffff&text=This+banner+should+vanish+in+print+view">
<div>
<h1>H1 tags outside ot a p tag are hidden in reader view</h1>
<img class="no-print" src="http://dummyimage.com/1024x100/000/ffffff&text=This+banner+is resized+in+print+view">
<p>
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
123456789 123456
</p>
</div>
</body>
<script>
var article = new Readability(document).parse();
</script>
</html>
答案 0 :(得分:2)
您尝试过吗?
在他们的github页面上:
“ Readability的parse()通过修改DOM起作用。这删除了网页中的某些元素。您可以通过在创建Readability对象时传递文档对象的副本来避免这种情况。”
var documentClone = document.cloneNode(true);
var article = new Readability(documentClone).parse();
您可以复制dom对象,这样您就不会真正修改真实的dom
答案 1 :(得分:1)
您可以像在文档中提到的那样一起使用DOMPurify和Readability-
import { Readability } from '@mozilla/readability'
import DOMPurify from 'dompurify';
function readable(doc) {
const reader = new Readability(doc)
const article = reader.parse()
return article
}
let cloneDoc = document.cloneNode(true)
let parsed = readable(cloneDoc)
const markup = DOMPurify.sanitize(parsed.content)
markup
将是可读内容的html字符串。
尝试console.log(parsed)
查看可用属性。
答案 2 :(得分:0)
好的。...
document.getElementById("body").innerHTML = "<font face='Calibri' size='4'>
<h1>"+article.title+"</h1>"+article.content;