Rhadoop - wordcount输出即将到来,但不是可读的格式

时间:2017-11-18 07:09:54

标签: r hadoop rhadoop

我按照这个链接的wordcount计划。该链接由" Rhadoop - wordcount using rmr"
给出 我得到输出但它不是可读格式。我想在输出中使用键值对。我怎么做到这一点。我应该对代码做什么修改。请帮帮我。

这是输出

hadoop @ hadoop-vm:〜/ apache / hadoop-1.2.1 $ bin / hadoop fs -cat / user / hadoop / out10 / part * SEQ/org.apache.hadoop.typedbytes.TypedBytesWritable/org.apache.hadoop.typedbytes.TypedBytesWritablea�9D_5X��&�\ hadoop @ hadoop-vm:〜/ apache / hadoop-1.2.1 $

这是代码

Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar")

# load librarys
library(rmr2)
library(rhdfs)

# initiate rhdfs package
hdfs.init()

map <- function(k,lines) {
  words.list <- strsplit(lines, '\\s')
  words <- unlist(words.list)
  return( keyval(words, 1) )
}

reduce <- function(word, counts) {
  keyval(word, sum(counts))
}

wordcount <- function (input, output=NULL) {
  mapreduce(input=input, output=output, input.format="text", map=map, reduce=reduce)
}

## read text files from folder example/wordcount/data
hdfs.root <- 'example/wordcount'
hdfs.data <- file.path(hdfs.root, 'data')

## save result in folder example/wordcount/out
hdfs.out <- file.path(hdfs.root, 'out')

## Submit job
out <- wordcount(hdfs.data, hdfs.out) 

## Fetch results from HDFS
results <- from.dfs(out)
results.df <- as.data.frame(results, stringsAsFactors=F)
colnames(results.df) <- c('word', 'count')

head(results.df)

0 个答案:

没有答案