我有一个带有Person详细信息的JavaRDD,现在我想先根据Age字段然后在Name字段中对JavaRDD元素进行排序。
示例输入是:
Age, Name, Country
33,Jack,USA
24,Sam,USA
31,Jack,USA
我的输出应该是这样的:
Age, Name, Country
24,Sam,USA
31,Jack,USA
33,Jack,USA
如何使用Sortby转换实现这一目标?
此致 香卡
答案 0 :(得分:2)
它在java中非常难看(那些scalas case类非常方便)但你可以通过为记录创建bean并实现可比较的方法来实现。现在只需使用带有标识键功能的sortBy方法:
'http://' + window.location.hostname + window.location.pathname;
答案 1 :(得分:1)
以下代码将根据需要执行任务 - >
JavaRDD<String> people = sc.textFile("/home/hduser/input");
// The schema is encoded in a string
String schemaString = "Age Name Country";
// Generate the schema based on the string of schema
List<StructField> fields = new ArrayList<StructField>();
for (String fieldName : schemaString.split(" ")) {
fields.add(DataTypes.createStructField(fieldName,
DataTypes.StringType, true));
}
StructType schema = DataTypes.createStructType(fields);
// Convert records of the RDD (people) to Rows.
JavaRDD<Row> rowRDD = people.map(new Function<String, Row>() {
public Row call(String record) throws Exception {
String[] fields = record.split(",");
return RowFactory.create(fields[0], fields[1].trim(),
fields[2].trim());
}
});
// Apply the schema to the RDD.
DataFrame peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema);
// Register the DataFrame as a table.
peopleDataFrame.registerTempTable("people");
// SQL can be run over RDDs that have been registered as tables.
DataFrame results = sqlContext.sql("SELECT * FROM people").sort("Age");
results.show();