如何创建DataFrame:列数不匹配

时间:2018-10-02 06:06:55

标签: scala apache-spark apache-spark-sql

我得到了错误:

  

java.lang.IllegalArgumentException:要求失败:   列不匹配。旧列名称(4):_1,_2,_3,_4新列   名称(1):“ srcId”,“ srcLabel”,“ dstId”,“ dstLabel”

在此代码中:

val columnNames = """'srcId', 'srcLabel', 'dstId', 'dstLabel'"""

import spark.sqlContext.implicits._

var df = Seq.empty[(String, String, String, String)]
  .toDF(columnNames)

2 个答案:

答案 0 :(得分:3)

您的方法的问题在于,columnNames是一个字符串,而您定义了tuple4的空字符串。因此,您必须将columnNames字符串分割成四个字符串,然后传递给toDF

正确的方法是按照以下步骤进行操作

val columnNames = """'srcId', 'srcLabel', 'dstId', 'dstLabel'"""

var df = Seq.empty[(String, String, String, String)]
  .toDF(columnNames.split(","): _*)

应该为您提供一个空的数据框,作为

+-------+-----------+--------+-----------+
|'srcId'| 'srcLabel'| 'dstId'| 'dstLabel'|
+-------+-----------+--------+-----------+
+-------+-----------+--------+-----------+

我希望答案会有所帮助

答案 1 :(得分:2)

scala> val columnNames = Seq("srcId", "srcLabel", "dstId", "dstLabel")
columnNames: Seq[String] = List(srcId, srcLabel, dstId, dstLabel)

scala> var d = Seq.empty[(String, String, String, String)].toDF(columnNames: _*)
d: org.apache.spark.sql.DataFrame = [srcId: string, srcLabel: string ... 2 more fields]