Google翻译通过网络抓取r

时间:2017-10-03 18:39:09

标签: r rvest

我有一份俄语语言的1000个文本列表,并想在R中将其转换为英语。我知道谷歌翻译有一些R包,但这需要API。并获得谷歌API现在支付。在Excel VBA中,我有一个访问谷歌翻译网站并转换它的宏。请参阅下面的URL和参数 -

getParam = "Прием (осмотр, консультация) врача-инфекциониста первичный"
translateFrom = "ru"
translateTo = "en"

URL = "https://translate.google.pl/m?hl=" & translateFrom & "&sl=" & translateFrom & "&tl=" & translateTo & "&ie=UTF-8&prev=_m&q=" & getParam

可以在R中完成同样的事情吗?

2 个答案:

答案 0 :(得分:1)

这是一个解决方案,

library(RCurl)
library(XML)

getParam = "Прием (осмотр, консультация) врача-инфекциониста первичный"
translateFrom = "ru"
translateTo = "en"

search <- gsub(" ", "%20", getParam)

URL <- paste("https://translate.google.pl/m?hl=",translateFrom,"&sl=",translateFrom,"&tl=",translateTo,"&ie=UTF-8&prev=_m&q=",search,sep="")

page <- getURL(URL)

tree <-htmlTreeParse(page)

body <- tree$children$html$children$body 

body_text <- body$children[[5]]$children[[1]]  

print(body_text) 

您可以从此question.

找到有关网络解析的更多信息

答案 1 :(得分:0)

这是一种可用于法语到英语的方法。您可以简单地将法语更改为俄语并使用相同的方法。


library(pdftools)

send_Text_To_Google_Translage_French_To_English <- function(text_To_Translate)
{
  library(stringr)
  library(pagedown)
  library(pdftools)
  text_To_Translate <- str_replace_all(string = text_To_Translate, pattern = "[:space:]", replacement = "%20")
  url <- paste0('https://translate.google.com/?hl=fr&sl=fr&tl=en&text=', text_To_Translate, '&op=translate')
  temp_PDF <- tempfile(fileext = ".pdf")
  tryCatch(pagedown::chrome_print(input = url, output = temp_PDF, wait = 2), error = function(e) NA)
  translated_Text <- pdf_text(temp_PDF)
  translated_Text <- strsplit(translated_Text, split = "\r\n")[[1]]
  return(translated_Text)
}

text_To_Translate <- "La tutela de Vieux-la-Romaine est une "
text_Translated <- callr::r(func = sent_Text_To_Google_Translage_French_To_English, 
                             args = list(text_To_Translate = text_To_Translate))

text_Translated