我正在尝试解决一个问题,该问题使我找到了最喜欢的零食与商品表上小吃名称相匹配的人,遇到的问题是商品表被嵌套,并给出了没有道理的错误。数组(字符串)不匹配字符串)
我试图在网上寻找如何处理嵌套列的过程,遇到了2个问题。 1.大多数有意义的东西都在scala上,当我尝试在python上做时,它有语法错误。 2.由于某种原因无法在我的pyspark上找到爆炸。
peopleDF是:
root
|-- email: string (nullable = true)
|-- fave_snack: string (nullable = true)
|-- first_name: string (nullable = true)
|-- gender: string (nullable = true)
|-- id: long (nullable = true)
|-- ip_address: string (nullable = true)
|-- last_name: string (nullable = true)
goodsDF是:
root
|-- products: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- body_html: string (nullable = true)
| | |-- created_at: string (nullable = true)
| | |-- handle: string (nullable = true)
| | |-- id: long (nullable = true)
| | |-- images: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- created_at: string (nullable = true)
| | | | |-- height: long (nullable = true)
| | | | |-- id: long (nullable = true)
| | | | |-- position: long (nullable = true)
| | | | |-- product_id: long (nullable = true)
| | | | |-- src: string (nullable = true)
| | | | |-- updated_at: string (nullable = true)
| | | | |-- variant_ids: array (nullable = true)
| | | | | |-- element: long (containsNull = true)
| | | | |-- width: long (nullable = true)
| | |-- options: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- name: string (nullable = true)
| | | | |-- position: long (nullable = true)
| | | | |-- values: array (nullable = true)
| | | | | |-- element: string (containsNull = true)
| | |-- product_type: string (nullable = true)
| | |-- published_at: string (nullable = true)
| | |-- tags: array (nullable = true)
| | | |-- element: string (containsNull = true)
| | |-- title: string (nullable = true) ##this is the title I'm using
| | |-- updated_at: string (nullable = true)
| | |-- variants: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- available: boolean (nullable = true)
| | | | |-- compare_at_price: string (nullable = true)
| | | | |-- created_at: string (nullable = true)
| | | | |-- featured_image: struct (nullable = true)
| | | | | |-- alt: string (nullable = true)
| | | | | |-- created_at: string (nullable = true)
| | | | | |-- height: long (nullable = true)
| | | | | |-- id: long (nullable = true)
| | | | | |-- position: long (nullable = true)
| | | | | |-- product_id: long (nullable = true)
| | | | | |-- src: string (nullable = true)
| | | | | |-- updated_at: string (nullable = true)
| | | | | |-- variant_ids: array (nullable = true)
| | | | | | |-- element: long (containsNull = true)
| | | | | |-- width: long (nullable = true)
| | | | |-- grams: long (nullable = true)
| | | | |-- id: long (nullable = true)
| | | | |-- option1: string (nullable = true)
| | | | |-- option2: string (nullable = true)
| | | | |-- option3: string (nullable = true)
| | | | |-- position: long (nullable = true)
| | | | |-- price: string (nullable = true)
| | | | |-- product_id: long (nullable = true)
| | | | |-- requires_shipping: boolean (nullable = true)
| | | | |-- sku: string (nullable = true)
| | | | |-- taxable: boolean (nullable = true)
| | | | |-- title: string (nullable = true)
| | | | |-- updated_at: string (nullable = true)
| | |-- vendor: string (nullable = true)
我试图加入他们的代码是:
peopleDF.join(goodsDF, peopleDF.fave_snack == goodsDF.products.title,"leftouter").show()
期望在PeopleDF中为表提供条目,以使fave_snack列与product.title匹配。但是实际结果是错误消息:
pyspark.sql.utils.AnalysisException: "cannot resolve '(`fave_snack` = `products`.`title`)' due to data type mismatch: differing types in '(`fave_snack` = `products`.`title`)' (string and array<string>).;;\n'Join LeftOuter, (fave_snack#1 = products#14.title)\n:- Relation[email#0,fave_snack#1,first_name#2,gender#3,id#4L,ip_address#5,last_name#6] json\n+- Relation[products#14] json\n"
任何见解都会有所帮助,谢谢。