我是第一次在Hadoop中使用自定义数据类型。这是我的代码:
自定义数据类型 - 对一周中出现的总和进行排列。每个细胞代表一天:
public class Days implements Writable {
private int[] days;
public Days() {
days = new int[7];
}
public int[] getDays() {
return days;
}
public void updateDayCount(int day, int value){
days[day] += value;
}
public void addDays(int[] other){
for(int i=0 ; i<7 ; i++)
days[i] += other[i];
}
@Override
public void readFields(DataInput in) throws IOException {
days = new int[7];
for(int i=0 ; i<7 ; i++)
days[i] = in.readInt();
}
@Override
public void write(DataOutput out) throws IOException {
for(int i=0 ; i<7 ; i++)
out.write(days[i]);
}
@Override
public String toString() {
String ans = "";
for(int i=0 ; i<7 ; i++)
ans += days[i] + ",";
ans = ans.substring(0, ans.length()-1);
return ans;
}
Mapper-获取日期 - 转换为day并将数据类型添加到特定单元格中:
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
if(key.get() > 0){
String line = value.toString();
String[] rowData = line.split(",");
if(rowData.length < 6){
return;
}
Days outDays = new Days();
outUserID = getUser(rowData);
String time = rowData[timeData].replaceAll("\"", "");
int day = -1;
try {
day = getTheDay(time);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
//error
if(day == -1)
return;
outDays.updateDayCount(day, 1);
context.write(outUserID, outDays);
}
}
我运行此代码并获得此EOFexception - 它在readFields方法上说:
ReduceAttempt TASK_TYPE="REDUCE" TASKID="task_201412070957_3019_r_000000" TASK_ATTEMPT_ID="attempt_201412070957_3019_r_000000_1" TASK_STATUS="FAILED" FINISH_TIME="1422358846237" HOSTNAME="bgu-bi-server-02\.haifa\.ibm\.com" ERROR="java\.io\.EOFException
at java\.io\.DataInputStream\.readInt(DataInputStream\.java:403)
at myPack\.Days\.readFields(Days\.java:35)
我真的不知道为什么。有谁能看出原因并有解决方案吗?请帮忙。
答案 0 :(得分:0)
在它假设的自定义数据中 -
public void write(DataOutput out) throws IOException {
for(int i=0 ; i<7 ; i++)
out.writeInt(days[i]);
}
非常感谢!