我尝试读取文件(csv)并打印其架构。我的问题是我的文件没有像SQL一样查询的标题。 我试过这段代码:
val logFile = "../resouces/cells.csv"
val dfCells = spark.read
.format("csv")
.option("header", "false")
.option("mode", "DROPMALFORMED")
.option("delimiter", "|")
.csv(logFile)
dfCell.printSchema;
文件输入为:
ES|15032017|25100|54600||3G|FIBRE|OUTDOOR|COMPANY|MAST|MACRO||47001|DU|41.651834|-4.728534||||||||||||||||
ES|15032017|25101|54601||3G|FIBRE|OUTDOOR|COMPANY|ROOFTOP|MACRO||47001|DU|41.651994|-4.724693||||||||||||||||
ES|15032017|25102|54602||4G|FIBRE|OUTDOOR|COMPANY|ROOFTOP|MICRO||47001|U|41.650912|-4.720648||||||||||||||||
ES|15032017|25103|54603||3G|MICROWAVES|OUTDOOR|COMPANY|ROOFTOP|MACRO||47001|U|41.647312|-4.717118||||||||||||||||
输出结果为:
|
|
|
答案 0 :(得分:2)
看起来你有一个错字。使用dfCells.printSchema
。
答案 1 :(得分:0)
我使用带有load
函数的Spark 1.5.0而不是csv
。
val logFile = "../input.csv"
val dfCells = sqlContext.read
.format("csv")
.option("header", "false")
.option("mode", "DROPMALFORMED")
.option("delimiter", "|")
.load(logFile)
dfCells.show()
+---+--------+-----+-----+---+---+----------+-------+-------+-------+-----+---+-----+---+---------+---------+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| C0| C1| C2| C3| C4| C5| C6| C7| C8| C9| C10|C11| C12|C13| C14| C15|C16|C17|C18|C19|C20|C21|C22|C23|C24|C25|C26|C27|C28|C29|C30|C31|
+---+--------+-----+-----+---+---+----------+-------+-------+-------+-----+---+-----+---+---------+---------+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| ES|15032017|25100|54600| | 3G| FIBRE|OUTDOOR|COMPANY| MAST|MACRO| |47001| DU|41.651834|-4.728534| | | | | | | | | | | | | | | | |
| ES|15032017|25101|54601| | 3G| FIBRE|OUTDOOR|COMPANY|ROOFTOP|MACRO| |47001| DU|41.651994|-4.724693| | | | | | | | | | | | | | | | |
| ES|15032017|25102|54602| | 4G| FIBRE|OUTDOOR|COMPANY|ROOFTOP|MICRO| |47001| U|41.650912|-4.720648| | | | | | | | | | | | | | | | |
| ES|15032017|25103|54603| | 3G|MICROWAVES|OUTDOOR|COMPANY|ROOFTOP|MACRO| |47001| U|41.647312|-4.717118| | | | | | | | | | | | | | | | |
+---+--------+-----+-----+---+---+----------+-------+-------+-------+-----+---+-----+---+---------+---------+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
,架构是:
dfCells.printSchema()
root
|-- C0: string (nullable = true)
|-- C1: string (nullable = true)
|-- C2: string (nullable = true)
|-- C3: string (nullable = true)
|-- C4: string (nullable = true)
|-- C5: string (nullable = true)
|-- C6: string (nullable = true)
|-- C7: string (nullable = true)
|-- C8: string (nullable = true)
|-- C9: string (nullable = true)
|-- C10: string (nullable = true)
|-- C11: string (nullable = true)
|-- C12: string (nullable = true)
|-- C13: string (nullable = true)
|-- C14: string (nullable = true)
|-- C15: string (nullable = true)
|-- C16: string (nullable = true)
|-- C17: string (nullable = true)
|-- C18: string (nullable = true)
|-- C19: string (nullable = true)
|-- C20: string (nullable = true)
|-- C21: string (nullable = true)
|-- C22: string (nullable = true)
|-- C23: string (nullable = true)
|-- C24: string (nullable = true)
|-- C25: string (nullable = true)
|-- C26: string (nullable = true)
|-- C27: string (nullable = true)
|-- C28: string (nullable = true)
|-- C29: string (nullable = true)
|-- C30: string (nullable = true)
|-- C31: string (nullable = true)