Question

我正在R中使用newsanchor包尝试通过NewsAPI提取整个文章内容。现在，我已经完成了以下操作：

require(newsanchor)
results <- get_everything(query = "Trump +Trade", language = "en")
test <- results$results_df

这给了我一个充满（最多）100条文章信息的数据框。但是，这些不包含整个实际的文章文本。而是它们包含如下内容：

[1] "Tensions between China and the U.S. ratcheted up several notches over the weekend as Washington sent a warship into the disputed waters of the South China Sea. Meanwhile, Google dealt Huaweis smartphone business a crippling blow and an escalating trade war co… [+5173 chars]"

是否有一种方法可以提取剩余的5173个字符。我试图阅读文档，但我不确定。

Answer 1

我认为至少免费计划是不可能做到的。如果您浏览了“响应对象”部分中https://newsapi.org/docs/endpoints/everything处的文档，则会显示：

内容-字符串

文章的未格式化内容（如果有）。对于开发人员计划用户，此长度被截断为260个字符。

因此，所有content仅限于260个字符。但是，test$url具有原始文章的链接，您可以使用该链接来抓取整个内容，但是由于它是从各种来源聚合而来，因此我认为没有一种自动的方法可以做到这一点。

通过newsanchor包提取完整的文章文本[R]

1 个答案: