Question

我遇到地图功能问题：原始数据存储在tsv文件中：我只想保存最后两列：第一个是原始节点（383），第二个是目标（4575），第三个是权重（1）

383 4575 1

383 4764 1

383 5458 1

383 5491 1

         public void map(LongWritable key, Text value,OutputCollector output, Reporter reporter) throws IOException {
        String line = value.toString();
        String[] tokens = line.split("t");

        int weight = Integer.parseInt(tokens[2]);
        int target = Integer.parseInt(tokens[0]);

    }

这是我的代码：

public void map(LongWritable key, Text value, Context context) throws IOException  InterruptedException
  {
  String line = value.toString();
  //split  the tsv file 
  String[] tokens = line.split("/t");
  //save the weight and target    
  private Text target = Integer.parsetxt(tokens[0]);
  int weight = Integer.parseInt(tokens[2]);
  context.write(new Text(target), new Intwritable(weight) );
  }
}


public class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> 
{
@Override
public void reduce(Text key, Iterable<IntWritable> values,Context context)
throws IOException, InterruptedException 
{
 //initialize the  count variable
    int weightsum = 0;
       for (IntWritable value : values) {
        weightsum += value.get();
 }
         context.write(key, new IntWritable(weightsum));
  }

}

Answer 1

String [] tokens = line.split（“t”）;

应该是

String [] tokens = line.split（“\ t”）;

Answer 2

用空格分开。

    String[] tokens = line.split("\\s+");

    private Text target = Integer.parsetxt(tokens[1]);
    int weight = Integer.parseInt(tokens[2]);

hadoop输入数据问题

2 个答案: