Azure Databricks将JSON数据写入Parquet文件会引发错误:TypeError:无法推断类型的架构

时间:2018-09-23 15:16:25

标签: python azure apache-spark-sql parquet databricks

我正在从WebService中使用Python在Microsoft Azure Databrics Notebook中下载以下数据:

{
    "Customers" : 
   [
        {
            "CustomID" : "106219-891457",
            "CustomerDateTime" : "0000105910",
            "purchasedItems" : 
            [
                {
                  "itemId" : "tBNU5awl2Yac",
                  "state" : "OBSOLETE",
                  "materialNumber" : "0000werqw4603100",
                  "materialName" : "Licasdr",
                  "quantity" : 1,
                  "orderType" : "STANDARD",
                  "Ingredients" : 
                  [
                    {
                        "ingredientId" : "146a00dd036__7e06",
                        "ingedrientDesc" : "bla"
                    },
                    {
                        "ingredientId" : "146a234d036__7e06",
                        "ingedrientDesc" : "bla2"
                    }
                  ],
                  "lastModificationDate" : "2014-09-30T10:13:46.8Z"
                }
            ]
        }
    ]
}

这很好用,我得到的结果如上面的笔记本所示。

我需要将此数据转换/写入Parquet文件。我正在尝试通过以下行做到这一点

conn = httplib.HTTPSConnection('companyhost.com')
conn.request("POST", "/public/api/customers/purchases/findByDate", request, headers)
response = conn.getresponse()
data = response.read()
print(data)
conn.close()

from pyspark.sql.types import *

df = spark.createDataFrame(data)
df.show()

df.write.format('parquet').save(mypath)

但是在行中

  

df = spark.createDataFrame(data)

我收到以下错误消息:

  

TypeError:无法推断类型的架构:类型<'str'>

这是怎么回事?我在做什么错了?

0 个答案:

没有答案