我有一个JavaPairRDD,要在其上进行迭代,执行一些操作并将输出存储到Hive。当前,我正在尝试在foreach内创建一个Dataframe,这将引发异常,因为无法在foreach内创建Dataframe。那有什么替代方法呢?
JavaPairRDD<Long, Iterable<EmployeeDetail>> employeeDetailPairList = fetchEmployeeDetailData();
List<EmployeeZone> employeeZoneFCList = fetchEmployeeZoneData();
employeeDetailPairList.foreach(employeeDetailPair -> {
Iterable<EmployeeDetail> employeeDetailList = employeeDetailPair._2;
Set<String> zipCodeSet = StreamSupport.stream(employeeDetailList.spliterator(), false).map(e -> e.getZipCode()).collect(Collectors.toSet());
List<EmployeeZone> employeeZoneFilteredList = employeeZoneList.stream().filter(e -> zipCodeSet.contains(String.valueOf(e.getLoc()))).collect(Collectors.toList());
List<Output> outputListList = processEmployeeData(employeeZoneFilteredList);
outputListList = addWeekStartDay(outputListList, weekStartDay);
if(outputListList != null && this.getSession()!= null) {
Dataset<Row> recordsDF = this.getSession().sqlContext().createDataFrame(outputListList, Output.class);
recordsDF.write().insertInto(SHIPCODE_PREFERRED_FC_HIVE_TABLE);
}
});
答案 0 :(得分:0)
您不能在转换内创建DataFrame。实现此目的的唯一方法是将Hivetable与RDD结合在一起。这样可以避免查找并执行所需的操作。希望它能回答您的问题