Question

我有Dataset<Tuple2<String,DeviceData>>并希望将其转换为Iterator<DeviceData>。

以下是我使用collectAsList()方法然后获取Iterator<DeviceData>的代码。

Dataset<Tuple2<String,DeviceData>> ds = ...;
List<Tuple2<String, DeviceData>> listTuple = ds.collectAsList();

ArrayList<DeviceData> myDataList = new ArrayList<DeviceData>();
for(Tuple2<String, DeviceData> tuple : listTuple){
    myDataList.add(tuple._2());
}

Iterator<DeviceData> myitr = myDataList.iterator();

我无法使用collectAsList()，因为我的数据量巨大且会妨碍性能。我查看了Dataset API，但无法获得任何解决方案。我用Google搜索，但无法找到答案。有人可以指导我吗？如果解决方案在java中会很棒。感谢。

编辑：

DeviceData类是简单的javabean。这是ds。

的printSchema（）输出

root
 |-- value: string (nullable = true)
 |-- _2: struct (nullable = true)
 |    |-- deviceData: string (nullable = true)
 |    |-- deviceId: string (nullable = true)
 |    |-- sNo: integer (nullable = true)

Answer 1

您可以直接从DeviceData中提取ds，而不是再次收集和构建。

<强>爪哇：

Function<Tuple2<String, DeviceData>, DeviceData> mapDeviceData =
    new Function<Tuple2<String, DeviceData>, DeviceData>() {
      public DeviceData call(Tuple2<String, DeviceData> tuple) {
        return tuple._2();
      }
    };

Dataset<DeviceData> ddDS = ds.map(mapDeviceData) //extracts DeviceData from each record

<强> Scala的：

val ddDS = ds.map(_._2) //ds.map(row => row._2)

如何转换数据集<tuple2 <string，devicedata>＆gt;到Iterator <devicedata>

1 个答案: