我正在从WebService中使用Python在Microsoft Azure Databrics Notebook中下载以下数据:
{
"Customers" :
[
{
"CustomID" : "106219-891457",
"CustomerDateTime" : "0000105910",
"purchasedItems" :
[
{
"itemId" : "tBNU5awl2Yac",
"state" : "OBSOLETE",
"materialNumber" : "0000werqw4603100",
"materialName" : "Licasdr",
"quantity" : 1,
"orderType" : "STANDARD",
"Ingredients" :
[
{
"ingredientId" : "146a00dd036__7e06",
"ingedrientDesc" : "bla"
},
{
"ingredientId" : "146a234d036__7e06",
"ingedrientDesc" : "bla2"
}
],
"lastModificationDate" : "2014-09-30T10:13:46.8Z"
}
]
}
]
}
这很好用,我得到的结果如上面的笔记本所示。
我需要将此数据转换/写入Parquet文件。我正在尝试通过以下行做到这一点
conn = httplib.HTTPSConnection('companyhost.com')
conn.request("POST", "/public/api/customers/purchases/findByDate", request, headers)
response = conn.getresponse()
data = response.read()
print(data)
conn.close()
from pyspark.sql.types import *
df = spark.createDataFrame(data)
df.show()
df.write.format('parquet').save(mypath)
但是在行中
df = spark.createDataFrame(data)
我收到以下错误消息:
TypeError:无法推断类型的架构:类型<'str'>
这是怎么回事?我在做什么错了?