输入:我有固定宽度的文件作为输入
输出:需要创建数据框
val nullable = true
val schema = StructType(
schemaString.split(",").map(fieldName => StructField(fieldName.trim, StringType, nullable)))
val rowRDD = sc.textFile(sourcePath).map(p =>
Row(p.substring(0,11),p.substring(11,22),p.substring(22,33),p.substring(33,65),p.substring(65,73),p.substring(73,75),p.substring(75,95),
p.substring(95,98),p.substring(98,101),p.substring(101,120),p.substring(120,139),p.substring(139,158),
p.substring(158,177),p.substring(177,196),p.substring(196,215),p.substring(215,234),p.substring(234,253),
p.substring(253,272),p.substring(272,291),p.substring(291,310),p.substring(310,329),p.substring(329,348),
p.substring(348,356),p.substring(356,376),p.substring(376,395),p.substring(395,414),p.substring(414,434),
p.substring(434,454),p.substring(454,473),p.substring(473,481),p.substring(481,501),p.substring(501,520),
p.substring(520,539),p.substring(539,558),p.substring(558,577),p.substring(577,596),p.substring(596,615),
p.substring(615,634),p.substring(634,653),p.substring(653,672),p.substring(672,673),p.substring(673,674),p.substring(674,675)
)
)
val inputDataWithSchema = sqlContext.createDataFrame(rowRDD, schema)
还提到了和schemaString。 我检查了每行的长度,就像675,但是当我收集结果时,我得到了一个错误:
"java.lang.StringIndexOutOfBoundsException: String index out of range: 675"
如何解决此问题?