多个数据帧连接

时间:2016-06-16 02:27:05

标签: scala apache-spark apache-spark-sql

我收到了多个包含客户历史信息的csv文件:

  • CustomerInfo.csv:ssn,name
  • CustomerAddresses.csv:ssn,addr_type,street,state,zipcode
  • CustomerPhoneNumbers.csv:ssn,phone_type,phone_num
  • CustomerCreditHistory.csv:ssn,from_date,to_date,score

我将这些文件作为Dataframe读取,需要将它们连接起来构建以下对象模型:

case class Address(addressType: String, street: String, state: String, zipCode: String)
case class Phone(phoneType: String, number: String)
case class CreditHistory(fromDate: Date, toDate: Date, score: Double)
case class Customer(ssn: String, name: String, addresses: Seq[Address], phones: Seq[Phone], credits: Seq[CreditHistory])

如果你看一下,每个客户可以有超过1个地址,电话或信用记录。

为了能够构建Customer对象,加入这4个数据帧的最佳方法是什么?

感谢。

0 个答案:

没有答案