I'm using x64 windows I would like to scrape google news data and I want a time series from 2004-2016 of news articles(headlines) per month or day, in order to conduct some analysis. I tried some sources in R that is GoogleNewsSource(), getURIAsynchronous, read_html() ... first,
library("tm")
library("tm.plugin.webmining")
googlenews <- GoogleNewsSource("yen", since="1-2-2015", until="31-2-2015")
(Someone answered about similar to this problem, add a option as_drrb=b. But not work)
Second,
url <- "https://www.google.co.kr/search?q=yen&num=100&hl=en&tbm=nws&tbs=cdr:1,cd_min:4/20/2014,cd_max:1/14/2015"
uris = c(url)
txt = getURIAsynchronous(uris)</i>
When I run this code, news are newest like 'Dec 9, 2016' NOT 2015. In the results, url is changed that.
I think that gbv=1 works to ignore search periods. But I can't find why changed this link.
Third,
library(rvest)
headlines = read_html("https://www.google.co.kr/search?q=yen&num=100&hl=en&tbm=nws&output=rss&tbs=cdr:1,cd_min:4/20/2014,cd_max:1/14/2015") %>%
html_nodes(".r") %>%
html_text()
It has same problem about gbv=1.
I found the option gbv=1:without JAVA, gbv=2: with JAVA.
I want to know solution any method.