我有一个JavaRDD中的项目列表,其中每个项目都是一个日期(Java日历)。现在,我想过滤所有小于给定日期的日期。那是我的代码:
主
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("Date comparison test")
.setMaster("local[4]").set("spark.executor.memory", "1g");
JavaSparkContext sc = new JavaSparkContext(conf);
// initializes a filter date to 01/01/2016 at 10:00:00
Calendar filterDate = Calendar.getInstace();
filterDate.clear();
filterDate.setTimeInMillis(1451642400000l);
// initializes an array of 40 calendars, in which every date
// is 1 hour later than the previous, starting from
// 01/01/2016 at 08:00:00
ArrayList<Calendar> calendarArray = new ArrayList<>();
// milliseconds corresponding to 01/01/2016 at 08:00:00
long initial = 1451635200000l;
for(int i=0; i < 40; ++i) {
Calendar one = Calendar.getInstace();
one.clear();
one.setTimeInMillis(initial);
calendarArray.add(one);
initial += 3600000;
}
JavaRDD<Calendar> rdd = sc.parallelize(calendarArray);
JavaRDD<Calendar> rddFiltered = rdd.filter(new FilterTest(filterDate));
System.out.println("RDD SIZE " + rddFiltered.count());
sc.close();
}
FilterTest代码
public class FilterTest implements Function<Calendar, Boolean> {
private static final long serialVersionUID = -3134317182912968444L;
private final Calendar filteringDate;
public FilterTest_(Calendar filteringDate) {
super();
this.filteringDate = filteringDate;
}
@Override
public Boolean call(Calendar arg0) throws Exception {
// getStandardFormattedDate just prints a date in a given format
System.out.println(TimeUtils.getStandardFormattedDate(arg0) + " - " + TimeUtils.getStandardFormattedDate(filteringDate));
if(arg0.before(filteringDate)) {
return false;
}
else {
return true;
}
}
}
我能够真正理解的是我得到的输出。这似乎是我传递的固定日历作为参数,以便有时与变化进行比较(例如,当它Sat, 01 Jan 2016 22:00:00
时)。
输出
Sat, 01 Jan 2016 08:00:00 - Fri, 01 Jan 2016 10:00:00
Sat, 01 Jan 2016 08:00:00 - Fri, 01 Jan 2016 10:00:00
Fri, 01 Jan 2016 08:00:00 - Fri, 01 Jan 2016 10:00:00
Sat, 02 Jan 2016 15:00:00 - Fri, 01 Jan 2016 09:00:00
Fri, 01 Jan 2016 09:00:00 - Fri, 01 Jan 2016 10:00:00
Sat, 02 Jan 2016 15:00:00 - Fri, 01 Jan 2016 09:00:00
Fri, 01 Jan 2016 10:00:00 - Fri, 01 Jan 2016 10:00:00
Sat, 02 Jan 2016 20:00:00 - Fri, 01 Jan 2016 10:00:00
Fri, 01 Jan 2016 10:00:00 - Fri, 01 Jan 2016 10:00:00
Fri, 01 Jan 2016 10:00:00 - Fri, 01 Jan 2016 10:00:00
Fri, 01 Jan 2016 10:00:00 - Fri, 02 Jan 2016 07:00:00
Sat, 02 Jan 2016 17:00:00 - Fri, 01 Jan 2016 10:00:00
Fri, 01 Jan 2016 11:00:00 - Fri, 01 Jan 2016 10:00:00
Sat, 02 Jan 2016 21:00:00 - Fri, 01 Jan 2016 10:00:00
Fri, 01 Jan 2016 10:00:00 - Sat, 01 Jan 2016 22:00:00
Fri, 01 Jan 2016 22:00:00 - Fri, 01 Jan 2016 10:00:00
Sat, 01 Jan 2016 22:00:00 - Fri, 01 Jan 2016 12:00:00
Fri, 01 Jan 2016 23:00:00 - Fri, 01 Jan 2016 10:00:00
Fri, 01 Jan 2016 23:00:00 - Fri, 01 Jan 2016 10:00:00
Fri, 01 Jan 2016 23:00:00 - Fri, 01 Jan 2016 10:00:00
Fri, 01 Jan 2016 10:00:00 - Fri, 02 Jan 2016 00:00:00
Sat, 02 Jan 2016 00:00:00 - Fri, 01 Jan 2016 19:00:00
Fri, 01 Jan 2016 13:00:00 - Fri, 02 Jan 2016 10:00:00
Sat, 01 Jan 2016 10:00:00 - Sat, 01 Jan 2016 14:00:00
Sat, 02 Jan 2016 01:00:00 - Fri, 01 Jan 2016 10:00:00
Fri, 01 Jan 2016 10:00:00 - Fri, 02 Jan 2016 11:00:00
Sat, 01 Jan 2016 14:00:00 - Fri, 02 Jan 2016 11:00:00
Sat, 02 Jan 2016 11:00:00 - Fri, 01 Jan 2016 15:00:00
Fri, 01 Jan 2016 15:00:00 - Sat, 02 Jan 2016 10:00:00
Sat, 02 Jan 2016 02:00:00 - Fri, 01 Jan 2016 10:00:00
Sat, 02 Jan 2016 15:00:00 - Sat, 02 Jan 2016 10:00:00
Fri, 01 Jan 2016 10:00:00 - Fri, 01 Jan 2016 10:00:00
Fri, 01 Jan 2016 10:00:00 - Fri, 01 Jan 2016 10:00:00
Fri, 01 Jan 2016 12:00:00 - Fri, 01 Jan 2016 22:00:00
Sat, 01 Jan 2016 17:00:00 - Fri, 02 Jan 2016 10:00:00
Fri, 01 Jan 2016 22:00:00 - Fri, 01 Jan 2016 10:00:00
Sat, 02 Jan 2016 13:00:00 - Fri, 01 Jan 2016 10:00:00
Sat, 01 Jan 2016 10:00:00 - Fri, 01 Jan 2016 10:00:00
Sat, 02 Jan 2016 23:00:00 - Fri, 01 Jan 2016 10:00:00
在计算分配到该变量期间究竟发生了什么?另外,因为显然结果是正确的,但我在更复杂的情况下调试此代码时遇到了麻烦。