将pyspark数据帧中的两列转换为一个python字典

时间:2019-04-16 20:20:31

标签: python pyspark

我有一个pyspark数据框,我想在其中使用其两列来输出字典。

输入pyspark数据框:

col1|col2|col3
v   |  3 | a
d   |  2 | b
q   |  9 | g

输出:

  

dict = {'v': 3, 'd': 2, 'q': 9}

我应该如何有效地做到这一点?

3 个答案:

答案 0 :(得分:1)

I believe you can achieve it by converting the DF (with only the two columns you want) to rdd:

data_rdd = data.selet(['col1', 'col2']).rdd

create a rdd containing key, pair with both columns using rdd.map function:

kp_rdd = data_rdd.map(lambda row : (row[0],row[1]))

and then collect as map:

dict = kp_rdd.collectAsMap()

that's the main idea, sorry I don't have an instance of pyspark running right now to test it.

答案 1 :(得分:1)

few different options here depending on the format needed ... check this out ... am using structured api ... if you need to persist then either save as json dict or preserve schema with parquet

from pyspark.sql.functions import to_json
from pyspark.sql.functions import create_map
from pyspark.sql.functions import col

df = spark\
.createDataFrame([\
    ('v', 3, 'a'),\
    ('d', 2, 'b'),\
    ('q', 9, 'g')],\
    ["c1", "c2", "c3"])

mapDF = df.select(create_map(col("c1"), col("c2")).alias("mapper"))
mapDF.show(3)

+--------+
|  mapper|
+--------+
|[v -> 3]|
|[d -> 2]|
|[q -> 9]|
+--------+

dictDF = df.select(to_json(create_map(col("c1"), col("c2")).alias("mapper")).alias("dict"))
dictDF.show()

+-------+
|   dict|
+-------+
|{"v":3}|
|{"d":2}|
|{"q":9}|
+-------+

keyValueDF = df.selectExpr("(c1, c2) as keyValueDict").select(to_json(col("keyValueDict")).alias("keyValueDict"))
keyValueDF.show()

+-----------------+
|     keyValueDict|
+-----------------+
|{"c1":"v","c2":3}|
|{"c1":"d","c2":2}|
|{"c1":"q","c2":9}|
+-----------------+

答案 2 :(得分:0)

以您的示例为例,export const initialState = { basket: [], }; const reducer = (state, action) => { switch (action.type) { case 'ADD_TO_CART': let newCart = [...state.basket] let itemInCart = newCart.find((item) => item.id === action.item.id); if (itemInCart) { itemInCart.quantity++ } else { itemInCart = { ...action.item, quantity: 1 }; newCart.push(itemInCart) }; return { ...state, basket: newCart }; default: return state; 将完成所需的词典,而无需任何其他步骤:

collectAsMap