我是第一次在Hadoop中使用自定义数据类型。这是我的代码:
自定义数据类型:
public class TwitterData implements Writable {
private Long id;
private String text;
private Long createdAt;
public TwitterData(Long id, String text, Long createdAt) {
super();
this.id = id;
this.text = text;
this.createdAt = createdAt;
}
public TwitterData() {
this(new Long(0L), new String(), new Long(0L));
}
@Override
public void readFields(DataInput in) throws IOException {
System.out.println("In readFields...");
id = in.readLong();
text = in.readLine();
createdAt = in.readLong();
}
@Override
public void write(DataOutput out) throws IOException {
System.out.println("In write...");
out.writeLong(id);
out.writeChars(text);
out.writeLong(createdAt);
}
public Long getId() {
return id;
}
public void setId(Long id) {
this.id = id;
}
public String getText() {
return text;
}
public void setText(String text) {
this.text = text;
}
public Long getCreatedAt() {
return createdAt;
}
public void setCreatedAt(Long createdAt) {
this.createdAt = createdAt;
}
}
Mapper:
public class Map extends Mapper<Object, BSONObject, Text, TwitterData>{
@Override
public void map(Object key, BSONObject value, Context context) throws IOException, InterruptedException {
BSONObject user = (BSONObject) value.get("user");
String location = (String) user.get("location");
TwitterData twitterData = new TwitterData((Long) value.get("id"),
(String) value.get("text"), (Long) value.get("createdAt"));
if(location.toLowerCase().indexOf("india") != -1) {
context.write(new Text("India"), twitterData);
} else {
context.write(new Text("Other"), twitterData);
}
}
}
主要职位代码:
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(TwitterData.class);
我在映射过程后抛出此异常。我很糟糕,为什么它显示这个错误。谁能帮帮我吗。 提前谢谢。
答案 0 :(得分:2)
你写字符,你读行。这是两个不同的序列化过程。
你需要做的是这样做:
@Override
public void readFields(DataInput in) throws IOException {
id = in.readLong();
text = in.readUTF();
createdAt = in.readLong();
}
@Override
public void write(DataOutput out) throws IOException {
out.writeLong(id);
out.writeUTF(text);
out.writeLong(createdAt);
}