r中的getURL函数出错

时间:2017-11-02 17:35:49

标签: r web-crawler google-translate

我正在尝试自动将不同语言的文本翻译成英语。

我正在对另一个问题进行解释:Google translate via web scraping r

但我得到了这个错误:

Error in function (type, msg, asError = TRUE)  :  Illegal characters found in URL

我的代码是

 getParam <- as.character(db$text) 
 translateFrom <- as.character(db$language)

 translateTo <- "en"  
 search <- gsub(" ", "%20", getParam) 
 URL <- paste("https://translate.google.pl/m?hl=",translateFrom,"&sl=",translateFrom,"&tl=",translateTo,"&ie=UTF-8&prev=_m&q=",search,sep="", ssl.verifypeer = FALSE)

 page <- getURL(URL)  

 tree <- htmlTreeParse(page)

 body <- tree$children$html$children$body 

1 个答案:

答案 0 :(得分:0)

library(XML)
library(RCurl)

db=data.frame(text = c("traduire", "tradurre"), langage=c("fr", "it"))

ls = unlist(apply(db, 1, list), recursive = FALSE)

lapply(unlist(apply(db, 1, list), recursive = FALSE), function(x){

  getParam <- as.character(x[1])
  translateFrom <- as.character(x[2])

  translateTo <- "en"
  search <- gsub(" ", "%20", getParam)
  URL <- paste("https://translate.google.pl/m?hl=",translateFrom,"&sl=",translateFrom,"&tl=",translateTo,"&ie=UTF-8&prev=_m&q=",search,sep="")
  page <- getURL(URL)
  tree <-htmlTreeParse(page)
  body <- tree$children$html$children$body
  body_text <- body$children[[5]]$children[[1]]
  body_text

})