自定义数据在地图输出中写入未知数据

时间:2016-06-08 05:15:47

标签: java hadoop mapreduce

有人可以帮助我理解为什么我得到自定义数据类型的奇怪行为我指的是this而我的映射器代码是

public class customDataMapper extends Mapper<LongWritable, Text,Text,customText > {

Text url = new Text();
Text date = new Text();
Text ip = new Text();
customText ctext = new customText();

public void map (LongWritable key , Text value , Context context) throws IOException , InterruptedException{

    String words[] = value.toString().split("|");
    url.set(words[1]);
    date.set(words[2]);
    ip.set(words[4]);
    ctext.set(date,ip);
    context.write(url, ctext);
}   
}

和 customText数据类型代码是

public class customText implements WritableComparable<customText>{

private Text url , ip;

public customText(){
    this.url=new Text();
    this.ip=new Text();

}

public customText(Text URL , Text IP){
    this.url=URL;
    this.ip=IP;


}


public void set (Text URL , Text IP){
    this.url=URL;
    this.ip=IP;

}


public void readFields(DataInput in) throws IOException{
    url.readFields(in);
    ip.readFields(in);

}

public void write(DataOutput out ) throws IOException{
    url.write(out);
    ip.write(out);

}


public int compareTo(customText o){
    if(url.compareTo(o.ip)==0){

        return (ip.compareTo(o.ip));

    }
    else return (url.compareTo(o.ip));
}


public boolean equals(Object o){


    if (o instanceof customText){
    customText other = (customText)o;   
    return (url.equals(other.ip)) && ip.equals(other.ip);
    }
    return false;
}

public int hashCode(){
    return url.hashCode();

 }

我收到的输出为

  

hduser @ pradeep-VirtualBox:〜/ builds $ hadoop fs -cat   /用户/ hadoop的/ dir8_customData /输出/部分-M-00000   1 customData.customDataSample1.customText@51   1 customData.customDataSample1.customText@51   1 customData.customDataSample1.customText@51   1 customData.customDataSample1.customText@51   1 customData.customDataSample1.customText@51

我的输入文件是

127248|/rr.html|2014-03-10|12:32:08|42.416.153.181
12|/rr12.html|2014-03-11|12:00:00|42.416.153.182
127241|/rr3232.html|2014-03-12|13:32:00|42.416.153.183
1272|/rrw33232.html|2014-03-15|14:32:08|42.416.153.184
121|/rr21212.html|2015-12-10|16:32:08|42.416.153.185

有人可以帮助我理解我收到此输出的原因吗? 其次我不确定compareTo是如何工作的,我的意思是说当在reducer中创建新组时。我是hadoop和java编程的新手。

由于

2 个答案:

答案 0 :(得分:3)

您使用|分割split("|")。这应该是split("\\|")。请参阅why escaping a pipe is needed的这个SO答案。

您的customText类需要覆盖toString(),以便它知道如何反序列化对象中包含的数据。例如:

@Override
public String toString() {
    return url + "," + ip;
}

您还错误地设置了Text个对象:

public void set (Text URL , Text IP){
    this.url=URL;
    this.ip=IP;
}

这应该是:

public void set(Text URL , Text IP){
    this.url.set(URL);
    this.ip.set(IP);
}

如果您的自定义Writable对象被用作值,则只需要实现Writable接口而不是WritableComparable。只有Hadoop需要对密钥进行分组和排序的密钥才需要WritableComparable接口。

您的compareTo()方法没有意义(您将网址与IP进行比较):

public int compareTo(customText o){
    if(url.compareTo(o.ip)==0){
        return (ip.compareTo(o.ip));
    }
    else return (url.compareTo(o.ip));
}

应该是这样的:

@Override
public int compareTo(customText o) {

    int result = url.compareTo(o.url);
    if (result != 0) {
        return result;
    }
    return ip.compareTo(o.ip);
}

您的哈希码应该如下所示:

@Override
public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + ((ip == null) ? 0 : ip.hashCode());
    result = prime * result + ((url == null) ? 0 : url.hashCode());
    return result;
}

目前它仅使用url并忽略ip

您也将date传递给ctext.set(date,ip)。该变量在自定义对象中称为url

样式方面,您的变量名称应为小写URL = url,类应以大写customText = CustomText

开头

答案 1 :(得分:1)

由于toString()方法在您继承的类中是可用的,因此必须@Override toString()

它应该在运行程序之前发出错误,而不是错误,但至少有一个黄色通知表明这应该被覆盖或者我是否将它与android studio混淆?