我试图从rdd创建一个数据帧。我想明确指定架构。下面是我尝试过的代码片段。
from pyspark.sql.types import StructField, StructType , LongType, StringType
stringJsonRdd_new = sc.parallelize(('{"id": "123", "name": "Katie", "age": 19, "eyeColor": "brown" }',\
'{ "id": "234","name": "Michael", "age": 22, "eyeColor": "green" }',\
'{ "id": "345", "name": "Simone", "age": 23, "eyeColor": "blue" }'))
mySchema = StructType([StructField("id", LongType(), True), StructField("age", LongType(), True), StructField("eyeColor", StringType(), True), StructField("name", StringType(),True)])
new_df = sqlContext.createDataFrame(stringJsonRdd_new,mySchema)
new_df.printSchema()
root
|-- id: long (nullable = true)
|-- age: long (nullable = true)
|-- eyeColor: string (nullable = true)
|-- name: string (nullable = true)
当我尝试 new_df.show()时,我收到错误消息:
ValueError: Unexpected tuple '{"id": "123", "name": "Katie", "age": 19, "eyeColor": "brown" }' with StructType
有人能帮助我吗?
PS:我可以使用:
显式地进行类型转换并从现有的df创建一个新的dfcasted_df = stringJsonDf.select(stringJsonDf.age,stringJsonDf.eyeColor, stringJsonDf.name,stringJsonDf.id.cast('int').alias('new_id'))
答案 0 :(得分:1)
您将数据框字符串作为输入而不是字典,因此无法将其映射到您定义的类型。
如果您修改下面的代码(也将数据中的“id”更改为数字而不是字符串 - 或者将“id”的结构类型从LongType
更改为StringType
):
from pyspark.sql.types import StructField, StructType , LongType, StringType
# give dictionaries instead of strings:
stringJsonRdd_new = sc.parallelize((
{"id": 123, "name": "Katie", "age": 19, "eyeColor": "brown" },\
{ "id": 234,"name": "Michael", "age": 22, "eyeColor": "green" },\
{ "id": 345, "name": "Simone", "age": 23, "eyeColor": "blue" }))
mySchema = StructType([StructField("id", LongType(), True), StructField("age", LongType(), True), StructField("eyeColor", StringType(), True), StructField("name", StringType(),True)])
new_df = sqlContext.createDataFrame(stringJsonRdd_new,mySchema)
new_df.printSchema()
root
|-- id: long (nullable = true)
|-- age: long (nullable = true)
|-- eyeColor: string (nullable = true)
|-- name: string (nullable = true)
+---+---+--------+-------+
| id|age|eyeColor| name|
+---+---+--------+-------+
|123| 19| brown| Katie|
|234| 22| green|Michael|
|345| 23| blue| Simone|
+---+---+--------+-------+
希望这有帮助,祝你好运!