Question

我正在使用InstaPaper API

我正在使用此字符串来提取文章的内容。

$Bookmark_Text = $connection->getBookmarkText($Bookmark['bookmark_id']);

不幸的是，它正在拉动整个html，基本上将HTML结构放在我的HTML中。

实施例

<html>
<head></head>
<body>
    <html>
    <head>Instapaper Title</head>
    <body>InstaPaper Article Content</body>
    </html>
</body>
</html>

关于如何获得“Instapaper文章内容”的任何想法

谢谢！

Answer 1

这里有一些JS代码只提取文章并删除Instapaper的东西（例如顶部和底部栏）。

html.replace(/^[\s\S]*<div id="story">|<\/div>[^<]*<div class="bar bottom">[\s\S]*$/gim, '');

请注意，随着Instapaper的HTML输出更改，它可能会发生变化。

Answer 2

使用解析器提取<body>的内容。 PHP has some built in，但有others可能更容易使用。

如果$Bookmark_Text是有效的HTML文档，则应该这样做。

$dom = new DOMDocument();
$dom->loadHTML($Bookmark_Text);
$body = $dom->getElementsByTagName('body')->item(0);
$content = $body->ownerDocument->saveHTML($body);

InstaPaper API - / api / 1 / bookmarks / get_text

2 个答案: