joindf.printSchema()
root
|-- order_customer_id: string (nullable = true)
|-- order_date: string (nullable = true)
|-- order_id: string (nullable = true)
|-- order_status: string (nullable = true)
|-- order_item_id: string (nullable = true)
|-- order_item_order_id: string (nullable = true)
|-- order_item_product_id: string (nullable = true)
|-- order_item_product_price: string (nullable = true)
|-- order_item_quantity: string (nullable = true)
|-- order_item_subtotal: string (nullable = true)
joindf.show(5)
+-----------------+--------------------+--------+------------+-------------+-------------------+---------------------+------------------------+-------------------+-------------------+
|order_customer_id| order_date|order_id|order_status|order_item_id|order_item_order_id|order_item_product_id|order_item_product_price|order_item_quantity|order_item_subtotal|
+-----------------+--------------------+--------+------------+-------------+-------------------+---------------------+------------------------+-------------------+-------------------+
| 10153|2013-08-17 00:00:...| 4061| COMPLETE| 10153| 4080| 365| 59.99| 4| 239.96|
| 10153|2014-01-12 00:00:...| 27596| PENDING| 10153| 4080| 365| 59.99| 4| 239.96|
| 10153|2014-07-18 00:00:...| 56604| CLOSED| 10153| 4080| 365| 59.99| 4| 239.96|
| 10153|2013-08-14 00:00:...| 58259| COMPLETE| 10153| 4080| 365| 59.99| 4| 239.96|
| 10153|2013-08-14 00:00:...| 58269| PENDING| 10153| 4080| 365| 59.99| 4| 239.96|
+-----------------+--------------------+--------+------------+-------------+-------------------+---------------------+------------------------+-------------------+-------------------+
我在此RDD上使用combineByKey()来生成一个结果,该结果给出了每天每个状态的总订单和总金额。 以下是代码:
joindf.map(lambda x: ((str(x[1]),str(x[3])),(float(x[9]),int(x[2]))))
.combineByKey(lambda v: (v[0],set(v[1])) ,
lambda acc,v: (acc[0]+v[0],v[1].add(acc[1])),
lambda acc1,acc2 : (acc1[0]+acc2[0],acc1[1].update(acc2[1])))
这是错误的。
TypeError:' int'对象不可迭代
我哪里出错了?请帮助。
答案 0 :(得分:0)
您已经拥有一个数据帧,您无需将其转换为RDD并执行操作。
据我所知您可以执行以下操作,但是代码是在scala中您可以将其转换为python
joindf.groupBy(split($"order_date", " ")(0).as("order_date"))
.agg(sum($"order_item_quantity"), sum($"order_item_subtotal"))
希望这有帮助!