从scala Spark中的RDD [type]获取不同的行

时间:2019-03-28 07:09:20

标签: scala apache-spark apache-spark-sql

假设我有一个RDD格式,例如RDD [employee],示例数据如下:-

FName,LName,Department,Salary
dubert,tomasz ,paramedic i/c,91080.00,
edwards,tim p,lieutenant,114846.00,
edwards,tim p,lieutenant,234846.00,
edwards,tim p,lieutenant,354846.00,
elkins,eric j,police,104628.00,
estrada,luis f,police officer,96060.00,
ewing,marie a,clerk,53076.00,
ewing,marie a,clerk,13076.00,
ewing,marie a,clerk,63076.00,
finn,sean p,firefighter,87006.00,
fitch,jordan m,law clerk,14.51
fitch,jordan m,law clerk,14.51

预期输出:-

dubert,tomasz ,paramedic i/c,91080.00,
edwards,tim p,lieutenant,354846.00,
elkins,eric j,police,104628.00,
estrada,luis f,police officer,96060.00,
ewing,marie a,clerk,63076.00,
finn,sean p,firefighter,87006.00,
fitch,jordan m,law clerk,14.51

我希望每行都基于不同的Fname

1 个答案:

答案 0 :(得分:1)

我认为您想做这样的事情:

df
.groupBy('Fname)
.agg(
  first('LName),
  first('Department),
  first('Salary)
)