我正在使用R Hadoop。我有一个映射的字符串,将每个单词设置为键,并将其长度设置为关联值。如何在mapreduce中找到最长的单词?
Sys.setenv("HADOOP_CMD"="/home/hadoop/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/home/hadoop/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.8.1.jar")
Sys.setenv("HADOOP_HOME"="/home/hadoop/hadoop")
Sys.setenv(JAVA_HOME="/usr/java/latest")
library(rhdfs)
library(rmr2)
hdfs.init()
line = "It's Supercalifragilisticexpialidocious!
Even though the sound of it
Is something quite atrocious
If you say it loud enough
You'll always sound precocious
Supercalifragilisticexpialidocious!"
to.dfs(line, output='/home/m072040031/small_doc.txt',
format="text")
wordcount = function(input,
output,
pattern = '[[:punct:][:space:][:digit:]]+'){
mapreduce(input = input,
output = output,
input.format= "text",
map = function(k,
lines){
v = unlist(strsplit(lines,
split= pattern))
keyval(v,
nchar(v))},
reduce = function(word,
count){
keyval(word,
count)}
)
}
wordcount("/home/m072040031/small_doc.txt",
output = "/home/m072040031/small_doc_wc.RData")
info=from.dfs("/home/m072040031/small_doc_wc.RData")
info