使用Spark读取固定宽度的文件时出错

时间:2018-07-25 13:10:57

标签: scala apache-spark

输入:我有固定宽度的文件作为输入
输出:需要创建数据框

val nullable = true
val schema = StructType(
  schemaString.split(",").map(fieldName => StructField(fieldName.trim, StringType, nullable)))


val rowRDD = sc.textFile(sourcePath).map(p =>
  Row(p.substring(0,11),p.substring(11,22),p.substring(22,33),p.substring(33,65),p.substring(65,73),p.substring(73,75),p.substring(75,95),
      p.substring(95,98),p.substring(98,101),p.substring(101,120),p.substring(120,139),p.substring(139,158),
      p.substring(158,177),p.substring(177,196),p.substring(196,215),p.substring(215,234),p.substring(234,253),
      p.substring(253,272),p.substring(272,291),p.substring(291,310),p.substring(310,329),p.substring(329,348),
      p.substring(348,356),p.substring(356,376),p.substring(376,395),p.substring(395,414),p.substring(414,434),
      p.substring(434,454),p.substring(454,473),p.substring(473,481),p.substring(481,501),p.substring(501,520),
      p.substring(520,539),p.substring(539,558),p.substring(558,577),p.substring(577,596),p.substring(596,615),
      p.substring(615,634),p.substring(634,653),p.substring(653,672),p.substring(672,673),p.substring(673,674),p.substring(674,675)
    )
  ) 
val inputDataWithSchema = sqlContext.createDataFrame(rowRDD, schema)
还提到了

和schemaString。 我检查了每行的长​​度,就像675,但是当我收集结果时,我得到了一个错误:

"java.lang.StringIndexOutOfBoundsException: String index out of range: 675"

如何解决此问题?

0 个答案:

没有答案