hadoop输入数据问题

时间:2015-03-23 04:17:58

标签: hadoop hadoop-streaming

我遇到地图功能问题: 原始数据存储在tsv文件中: 我只想保存最后两列: 第一个是原始节点(383),第二个是目标(4575),第三个是权重(1)

383 4575 1

383 4764 1

383 5458 1

383 5491 1

         public void map(LongWritable key, Text value,OutputCollector output, Reporter reporter) throws IOException {
        String line = value.toString();
        String[] tokens = line.split("t");

        int weight = Integer.parseInt(tokens[2]);
        int target = Integer.parseInt(tokens[0]);

    }

这是我的代码:

public void map(LongWritable key, Text value, Context context) throws IOException  InterruptedException
  {
  String line = value.toString();
  //split  the tsv file 
  String[] tokens = line.split("/t");
  //save the weight and target    
  private Text target = Integer.parsetxt(tokens[0]);
  int weight = Integer.parseInt(tokens[2]);
  context.write(new Text(target), new Intwritable(weight) );
  }
}


public class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> 
{
@Override
public void reduce(Text key, Iterable<IntWritable> values,Context context)
throws IOException, InterruptedException 
{
 //initialize the  count variable
    int weightsum = 0;
       for (IntWritable value : values) {
        weightsum += value.get();
 }
         context.write(key, new IntWritable(weightsum));
  }

}

2 个答案:

答案 0 :(得分:0)

String [] tokens = line.split(“t”);

应该是

String [] tokens = line.split(“\ t”);

答案 1 :(得分:0)

用空格分开。

    String[] tokens = line.split("\\s+");

    private Text target = Integer.parsetxt(tokens[1]);
    int weight = Integer.parseInt(tokens[2]);