如何从纽约时报Api中提取信息?

时间:2016-06-19 15:41:50

标签: r

我想通过library(rtimes)从纽约时报Api中提取信息。 api调用返回一个3的列表,它似乎以一种难以访问的方式包含我需要的信息,对于一个R新手来说。

install.packages("rtimes")
require(rtimes)
# Here I use the Key provides by the New York Times
api <- "[redacted]" 

# I create an empty vector to append required information to it,
mylist <- c()
 # The default article api call for "Crisis"
NY_terror<- as_search(q="Crisis",
                  begin_date = '20110101', 
                  end_date ='20110201',
                  fl=c("pub_date","headline","keywords","abstract","_id"),
                  facet_field=c("section_name"),
                  key = api)

  #Here I extract the data. At least I believe that
 mylist<- append(mylist, unlist(NY_terror$data))    

但我只是以一个必需的列“pub_date”以及freq结束。相应关键字的计数。请问如何生成包含flface_field中定义的列的数据框。

所以期望的输出应该类似于:

 id  section_name         pub_date      headline  keywords  abstract

  ...      Politics       2011-01-01      MAMBA      ...     ...   
                                         posted
                                         API Key

1 个答案:

答案 0 :(得分:0)

我认为这应该可以帮助您入门,并且可以继续以相同的方式添加更多字段:

b <- list()
for(i in 1:length(NY_terror$data$docs)){
  a <- as.data.frame(as.character(unlist(NY_terror$data$docs[[i]]$byline$person)))[5,1]
  b <- rbind(b,as.character(a))
 }
b <- unlist(b)
b # first author's last name (if given), can be expanded for multiple authors

c <- list()
for(i in 1:length(NY_terror$data$docs)){
  a <- as.data.frame(as.character(unlist(NY_terror$data$docs[[i]]$pub_date)))[[1]]
  c <- rbind(c,as.character(a))
}
c <- unlist(c)
c # dates

d <- list()
for(i in 1:length(NY_terror$data$docs)){
  a <- as.character(unlist(NY_terror$data$docs[[i]]$keywords[[1]]$value))
  d <- rbind(d,a)
}
d <- unlist(d)
d # keywords

res <- cbind(b,c,d)
res[,1] <- gsub("reported", "NA",res[,1])
res

      b           c                      d                                        
 [1,] "BOSMAN"    "2011-01-30T20:14:04Z" "Financial Crisis Inquiry Commission"    
 [2,] "CHAN"      "2011-01-29T09:00:03Z" "Regulation and Deregulation of Industry"
 [3,] NA          "2011-01-25T17:20:36Z" "Financial Crisis Inquiry Commission"    
 [4,] "CRAIG"     "2011-01-27T14:17:32Z" "Financial Crisis Inquiry Commission"    
 [5,] "MORGENSON" "2011-01-30T00:00:00Z" "Banking and Financial Institutions"     
 [6,] "BOSMAN"    "2011-01-31T00:00:00Z" "FINANCIAL CRISIS INQUIRY COMMISSION"    
 [7,] "CHAN"      "2011-01-25T00:00:00Z" "Subprime Mortgage Crisis"               
 [8,] "NA"        "2011-01-28T09:30:54Z" "Securities and Commodities Violations"  
 [9,] NA          "2011-01-25T02:15:29Z" "Justice Department"                     
[10,] "NOCERA"    "2011-01-29T00:00:00Z" "Banking and Financial Institutions"