我正在尝试使用python在spark 2.3中读取文本文件,但出现此错误。 这是textFile的格式:
name marks
amar 100
babul 70
ram 98
krish 45
代码:
df=spark.read.option("header","true")\
.option("delimiter"," ")\
.option("inferSchema","true")\
.schema(
StructType(
[
StructField("Name",StringType()),
StructField("marks",IntegerType())
]
)
)\
.text("file:/home/maria_dev/prac.txt")
错误:
java.lang.AssertionError: assertion failed: Text data source only produces a single data column named "value"
当我尝试将textFile读入RDD时,将其作为单个列收集。
应该更改数据文件还是应该更改代码?
答案 0 :(得分:3)
使用.csv
代替.text(仅生成单个值列),而使用>>> df=spark.read.option("header","true")\
.option("delimiter"," ")\
.option("inferSchema","true")\
.schema(
StructType(
[
StructField("Name",StringType()),
StructField("marks",IntegerType())
]
)
)\
.csv('file:///home/maria_dev/prac.txt')
>>> from pyspark.sql.types import *
>>> df
DataFrame[Name: string, marks: int]
>>> df.show(10,False)
+-----+-----+
|Name |marks|
+-----+-----+
|amar |100 |
|babul|70 |
|ram |98 |
|krish|45 |
+-----+-----+
将文件加载到DF。
<p>
<audio controls
src="https://soundbible.com/mp3/Tyrannosaurus%20Rex%20Roar-SoundBible.com-807702404.mp3">
</audio>