我想通过library(rtimes)
从纽约时报Api中提取信息。
api调用返回一个3的列表,它似乎以一种难以访问的方式包含我需要的信息,对于一个R新手来说。
install.packages("rtimes")
require(rtimes)
# Here I use the Key provides by the New York Times
api <- "[redacted]"
# I create an empty vector to append required information to it,
mylist <- c()
# The default article api call for "Crisis"
NY_terror<- as_search(q="Crisis",
begin_date = '20110101',
end_date ='20110201',
fl=c("pub_date","headline","keywords","abstract","_id"),
facet_field=c("section_name"),
key = api)
#Here I extract the data. At least I believe that
mylist<- append(mylist, unlist(NY_terror$data))
但我只是以一个必需的列“pub_date”以及freq结束。相应关键字的计数。请问如何生成包含fl
和face_field
中定义的列的数据框。
所以期望的输出应该类似于:
id section_name pub_date headline keywords abstract
... Politics 2011-01-01 MAMBA ... ...
posted
API Key
答案 0 :(得分:0)
我认为这应该可以帮助您入门,并且可以继续以相同的方式添加更多字段:
b <- list()
for(i in 1:length(NY_terror$data$docs)){
a <- as.data.frame(as.character(unlist(NY_terror$data$docs[[i]]$byline$person)))[5,1]
b <- rbind(b,as.character(a))
}
b <- unlist(b)
b # first author's last name (if given), can be expanded for multiple authors
c <- list()
for(i in 1:length(NY_terror$data$docs)){
a <- as.data.frame(as.character(unlist(NY_terror$data$docs[[i]]$pub_date)))[[1]]
c <- rbind(c,as.character(a))
}
c <- unlist(c)
c # dates
d <- list()
for(i in 1:length(NY_terror$data$docs)){
a <- as.character(unlist(NY_terror$data$docs[[i]]$keywords[[1]]$value))
d <- rbind(d,a)
}
d <- unlist(d)
d # keywords
res <- cbind(b,c,d)
res[,1] <- gsub("reported", "NA",res[,1])
res
b c d
[1,] "BOSMAN" "2011-01-30T20:14:04Z" "Financial Crisis Inquiry Commission"
[2,] "CHAN" "2011-01-29T09:00:03Z" "Regulation and Deregulation of Industry"
[3,] NA "2011-01-25T17:20:36Z" "Financial Crisis Inquiry Commission"
[4,] "CRAIG" "2011-01-27T14:17:32Z" "Financial Crisis Inquiry Commission"
[5,] "MORGENSON" "2011-01-30T00:00:00Z" "Banking and Financial Institutions"
[6,] "BOSMAN" "2011-01-31T00:00:00Z" "FINANCIAL CRISIS INQUIRY COMMISSION"
[7,] "CHAN" "2011-01-25T00:00:00Z" "Subprime Mortgage Crisis"
[8,] "NA" "2011-01-28T09:30:54Z" "Securities and Commodities Violations"
[9,] NA "2011-01-25T02:15:29Z" "Justice Department"
[10,] "NOCERA" "2011-01-29T00:00:00Z" "Banking and Financial Institutions"