如何将csv文件without any schema
加载到spark rdd和
数据框并分配架构
AA,19970101,47.82,47.82,47.82,47.82,0
stockname,date,highprice,lowprice,openprice,closeprice,volume
答案 0 :(得分:0)
可能首先可以为输入数据创建rdd,并且可以在rdd之上使用架构创建数据框。
from pyspark.sql.types import StructType
from pyspark.sql.types import StructField
from pyspark.sql.types import *
rdd = sc.textFile("//path/to/textfile/file.txt")
schema = StructType([
StructField("id", IntegerType(), True),
StructField("created_at", TimestampType(), True),
StructField("updated_at", StringType(), True)
])
df = sqlContext.createDataFrame(rdd, schema)