如何在R中获得最受欢迎的Facebook帖子

时间:2013-02-23 21:46:22

标签: facebook r https rcurl geturl

我正在尝试使用以下代码从Facebook上的页面获取帖子。即使查询在我在浏览器中输入时有效,我也会收到错误。这是我得到的错误:

WWW-Authenticate: OAuth "Facebook Platform" "invalid_request" "Unknown path components: 

非常感谢任何想法!

# go to 'https://developers.facebook.com/tools/explorer' to get your access token
access_token <- "### token ###"

require(RCurl)
require(rjson)

cafile <- system.file("CurlSSL", "cacert.pem", package = "RCurl")

options(RCurlOptions = list(verbose = TRUE, followlocation = TRUE, timeout = 100, useragent = "R"))


# set the curl options
curl <- getCurlHandle()
options(RCurlOptions = list(capath = system.file("CurlSSL", "cacert.pem",
                                             package = "RCurl"),
                        ssl.verifypeer = FALSE, verbose = TRUE, cookiejar = 'my_cookies.txt', 
                        cookiefile = 'my_cookies.txt',   followlocation = TRUE,                                                                        
                        useragent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3'))
curlSetOpt(.opts = list(proxy = 'proxyserver:port'), curl = curl)


# Facebook json function copied from original (Romain Francois) post
facebook <-  function( path = "me", access_token, options){
if( !missing(options) ){
options <- sprintf( "?%s", paste( names(options), "=", unlist(options), collapse = "&", sep = "" ) )
} else {
options <- ""
}
data <- getURL( sprintf( "https://graph.facebook.com/%s%s&access_token=%s", path, options, access_token ) )
fromJSON( data )
}


### TED FACEBOOK PAGE
# http://www.facebook.com/TED
# TED's Facebook ID 29092950651 can be found on http://graph.facebook.com/TED

ted <- list()
i<-0
next.path <- "29092950651/posts"

# download all TED posts
while(length(next.path)!=0) {
  i<-i+1
  ted[[i]] <- facebook( path=next.path , access_token=access_token)
  next.path <- sub("https://graph.facebook.com/","",ted[[i]]$paging$'next')
}
ted[[i]] <- NULL

# parse just video links posted by TED
parse.count.ted <- function(x) 
  if (x$type=="link" & x$from$id=="29092950651") x$likes$count else NA
parse.link.ted <- function(x) 
  if (x$type=="link" & x$from$id=="29092950651") x$link else NA
ted.counts <- unlist(sapply(ted, parse.master, f=parse.count.ted))
ted.links <- unlist(sapply(ted, parse.master, f=parse.link.ted))

# see three most popular talks
ted.links[order(ted.counts,decreasing=TRUE)][1:3]

1 个答案:

答案 0 :(得分:3)

这可能是URL格式化的问题。如果未指定options参数,则生成的URL将如下所示:/me/photos&access_token=...。在这里,path将是/me/photos&access_token,根据Facebook API,它可能不是有效的URL组件。

我认为facebook函数的以下更改将解决此问题:

require(RCurl)
require(rjson)

facebook <-  function( path = "me", access_token = token, options){
    if( !missing(options) ){
        options <- sprintf( 
                           "?%s&", 
                           paste( 
                                 names(options), "=", unlist(options), 
                                 collapse = "&", sep = "" 
                                 ) 
                           )
    } else {
        options <- "?"
    }

    urlTemplate <- "https://graph.facebook.com/%s%saccess_token=%s"
    data <- getURL( 
                   sprintf( 
                           urlTemplate, 
                           path,
                           options,
                           access_token 
                           ) 
                    )
    fromJSON( data )
}

现在,即使缺少options参数,结果网址也会如下:/me/photos?access_token=...