Question

我可以在csv文件的spark中定义包含以下子列的模式，并在 KeyFields 和 NonKeyFields

的基础上加入两个文件

KeyFields NonKeyFields
EmpId DOB FirstName LastName联系人Loc1 Loc2 DOJ评论主管

我的样本数据采用以下格式 1242569,11-Sep-95，SANDEEP，KUMAR，9010765550，HYDERABAD，OFFSHORE，16-Jan-16，Passsed Due，NAGALAKSHMI CHALLA

Answer 1

是的，你可以在阅读csv文件时这样做：

df = sqlContext.read.load(<path of the file>, schema)