如何使用API​​的堆栈库获取问题文本?

时间:2018-02-17 11:22:06

标签: r stackexchange-api

使用stackr,可以使用以下命令列出特定用户的问题。

library(devtools)
devtools::install_github("dgrtwo/stackr")
library(stackr)
textques <- stack_users(712603, "questions", num_pages=10, pagesize=100)

是否也可以提供每个问题的文本?

There is一个正文选项,但我如何将它与上一个命令一起使用?

另外,我试过body等于true但它不起作用:

textques <- stack_users(712603, "questions", body = "true", num_pages=10, pagesize=100)

2 个答案:

答案 0 :(得分:2)

由于stack_users()stack_GET()函数都接受(并传递)ellipsis argument,因此您可以将任何参数添加到API调用URL的末尾。这包括filter参数,我们可以使用the built-in withbody filter

library(stackr)
questions <- stack_users(9371451, "questions", num_pages=10, pagesize=100, filter="withbody")
for (i in 1:nrow(questions)) {
    qtext <- questions$body[i]
    print(qtext)
}

输出:

[1] "<p>I would like to use the Stack Exchange API with a specific user id to get the text of a user's badges.</p>\n\n<p>I found <a href=\"https://github.com/dgrtwo/stackr\" rel=\"nofollow noreferrer\">the <em>stackr</em> library for the Stack Exchange API</a>, and tried this:</p>\n\n<pre><code># install.packages(\"devtools\")\nlibrary(devtools)\ndevtools::install_github(\"dgrtwo/stackr\")\nlibrary(stackr)\nstack_users(712603)\n</code></pre>\n\n<p>But that only gives the total number of every kind of badge. How can I take the text from each one?\nExample:</p>\n\n<pre><code>gold silver bronze\nr     r      ggplot2\n</code></pre>\n\n<p>I don't only want the total number of badges but also what the badges are.</p>\n"
[1] "<p>Using stackr it is possible with the following command to take the question from a specific user.</p>\n\n<pre><code>library(devtools)\ndevtools::install_github(\"dgrtwo/stackr\")\nlibrary(stackr)\ntextques &lt;- stack_users(712603, \"questions\", num_pages=10, pagesize=100)\n</code></pre>\n\n<p>How is it possible to have also the text of every question?</p>\n\n<p><a href=\"https://api.stackexchange.com/docs/types/question\">There is</a> a body option but how could I use it with the previous command</p>\n\n<p>Also I tried body equals true but it is not working:</p>\n\n<pre><code>textques &lt;- stack_users(712603, \"questions\", body = \"true\", num_pages=10, pagesize=100)\n</code></pre>\n"

答案 1 :(得分:0)

我不确定软件包是否提供了提取问题的功能,但这是基于rvest

的方法
for(i in 1:nrow(textques)){
  link <- textques$link[i]
  textques$qtext[i] <- read_html(link) %>% html_node("#question .post-text") %>% html_text()
}

这应该在现有数据框中添加另一个变量,并在每个链接中添加问题文本。