我在scala和Spark中遇到var dropX = [];
var dropY = [];
var snowX = [];
var i;
for(i = 0; i < 100; i++) {
dropX[i] = 40;
dropY[i] = 0;
snowX[i] = 40;
}
的麻烦。
我有2个案例类:
groupByKey
目前我使用的是第二个案例类:
case class Employee(id_employee: Long, name_emp: String, salary: String)
但是,我想用这个新的替换它:
case class Company(id_company: Long, employee:Seq[Employee])
我使用case class Company(id_company: Long, name_comp: String employee:Seq[Employee])
创建groupByKey
个对象的父数据集(df1):
Company
此代码有效,它返回如下对象:
val companies = df1.groupByKey(v => v.id_company)
.mapGroups(
{
case(k,iter) => Company(k, iter.map(x => Employee(x.id_employee, x.name_emp, x.salary)).toSeq)
}
).collect()
但我没有找到将公司name_comp添加到这些对象的提示(此字段存在df1)。为了检索这样的对象(使用新的case类):
Company(1234,List(Employee(0987, John, 30000),Employee(4567, Bob, 50000)))
答案 0 :(得分:2)
由于您既需要公司ID和名称,但您可以做的是在对数据进行分组时使用元组作为键。这将在构造Company
类时轻松提供这两个值:
df1.groupByKey(v => (v.id_company, v.name_comp))
.mapGroups{ case((id, name), iter) =>
Company(id, name, iter.map(x => Employee(x.id_employee, x.name_emp, x.salary)).toSeq)}
.collect()