我想根据 Spark 结构化流 (pyspark) 中数据帧 (localSymbol) 的列值,将取决于字典字段 (localSymbol_aux) 的字典列表的值分配给数据帧
我有以下字典列表
level_1_batch_list_of_dict = [
{
'localSymbol_aux': 'EUR.USD',
'level_1_precio_max_sesion_aux': 4,
'level_1_precio_min_sesion_aux': 0
},
{
'localSymbol_aux': 'USD.JPY',
'level_1_precio_max_sesion_aux': 6,
'level_1_precio_min_sesion_aux': 0
}
]
我有以下数据框:
+------------------------------------------+-----------+-----------------+
|ventana |localSymbol|precio_max_sesion|
+------------------------------------------+-----------+-----------------+
|{2021-05-10 11:33:30, 2021-05-10 11:34:00}|EUR.USD |0 |
|{2021-05-10 11:34:00, 2021-05-10 11:34:30}|EUR.USD |0 |
|{2021-05-10 11:34:30, 2021-05-10 11:35:00}|EUR.USD |0 |
|{2021-05-10 11:35:00, 2021-05-10 11:35:30}|EUR.USD |0 |
|{2021-05-10 11:35:30, 2021-05-10 11:36:00}|EUR.USD |0 |
|{2021-05-10 11:36:00, 2021-05-10 11:36:30}|EUR.USD |0 |
|{2021-05-10 11:33:30, 2021-05-10 11:34:00}|USD.JPY |0 |
|{2021-05-10 11:34:00, 2021-05-10 11:34:30}|USD.JPY |0 |
|{2021-05-10 11:34:30, 2021-05-10 11:35:00}|USD.JPY |0 |
|{2021-05-10 11:35:00, 2021-05-10 11:35:30}|USD.JPY |0 |
|{2021-05-10 11:35:30, 2021-05-10 11:36:00}|USD.JPY |0 |
|{2021-05-10 11:36:00, 2021-05-10 11:36:30}|USD.JPY |0 |
+------------------------------------------+-----------+-----------------+
但我想要以下内容:
+------------------------------------------+-----------+-----------------+
|ventana |localSymbol|precio_max_sesion|
+------------------------------------------+-----------+-----------------+
|{2021-05-10 11:33:30, 2021-05-10 11:34:00}|EUR.USD |4 |
|{2021-05-10 11:34:00, 2021-05-10 11:34:30}|EUR.USD |4 |
|{2021-05-10 11:34:30, 2021-05-10 11:35:00}|EUR.USD |4 |
|{2021-05-10 11:35:00, 2021-05-10 11:35:30}|EUR.USD |4 |
|{2021-05-10 11:35:30, 2021-05-10 11:36:00}|EUR.USD |4 |
|{2021-05-10 11:36:00, 2021-05-10 11:36:30}|EUR.USD |4 |
|{2021-05-10 11:33:30, 2021-05-10 11:34:00}|USD.JPY |6 |
|{2021-05-10 11:34:00, 2021-05-10 11:34:30}|USD.JPY |6 |
|{2021-05-10 11:34:30, 2021-05-10 11:35:00}|USD.JPY |6 |
|{2021-05-10 11:35:00, 2021-05-10 11:35:30}|USD.JPY |6 |
|{2021-05-10 11:35:30, 2021-05-10 11:36:00}|USD.JPY |6 |
|{2021-05-10 11:36:00, 2021-05-10 11:36:30}|USD.JPY |6 |
+------------------------------------------+-----------+-----------------+
有什么想法吗?
谢谢!!
答案 0 :(得分:1)
您可以将上面的字典转换为 PySpark Dataframe 然后做一个简单的连接
# this is the reference dataframe
ref = spark.createDataFrame(level_1_batch_list_of_dict)
ref.show(10, False)
# +-----------------------------+-----------------------------+---------------+
# |level_1_precio_max_sesion_aux|level_1_precio_min_sesion_aux|localSymbol_aux|
# +-----------------------------+-----------------------------+---------------+
# |4 |0 |EUR.USD |
# |6 |0 |USD.JPY |
# +-----------------------------+-----------------------------+---------------+
# this is your main dataframe
df.show(2, False)
# +------------------------------------------+-----------+-----------------+
# |ventana |localSymbol|precio_max_sesion|
# +------------------------------------------+-----------+-----------------+
# |{2021-05-10 11:33:30, 2021-05-10 11:34:00}|EUR.USD |0 |
# |{2021-05-10 11:34:00, 2021-05-10 11:34:30}|EUR.USD |0 |
# +------------------------------------------+-----------+-----------------+
# join them together and get correct values
(df
.join(ref, on=[df.localSymbol == ref.localSymbol_aux])
.select(df.ventana, df.localSymbol, ref.level_1_precio_max_sesion_aux.alias('precio_max_sesion'))
.show(100, False)
)
# +------------------------------------------+-----------+-----------------+
# |ventana |localSymbol|precio_max_sesion|
# +------------------------------------------+-----------+-----------------+
# |{2021-05-10 11:33:30, 2021-05-10 11:34:00}|EUR.USD |4 |
# |{2021-05-10 11:34:00, 2021-05-10 11:34:30}|EUR.USD |4 |
# |{2021-05-10 11:34:30, 2021-05-10 11:35:00}|EUR.USD |4 |
# |{2021-05-10 11:35:00, 2021-05-10 11:35:30}|EUR.USD |4 |
# |{2021-05-10 11:35:30, 2021-05-10 11:36:00}|EUR.USD |4 |
# |{2021-05-10 11:36:00, 2021-05-10 11:36:30}|EUR.USD |4 |
# |{2021-05-10 11:33:30, 2021-05-10 11:34:00}|USD.JPY |6 |
# |{2021-05-10 11:34:00, 2021-05-10 11:34:30}|USD.JPY |6 |
# |{2021-05-10 11:34:30, 2021-05-10 11:35:00}|USD.JPY |6 |
# |{2021-05-10 11:35:00, 2021-05-10 11:35:30}|USD.JPY |6 |
# |{2021-05-10 11:35:30, 2021-05-10 11:36:00}|USD.JPY |6 |
# |{2021-05-10 11:36:00, 2021-05-10 11:36:30}|USD.JPY |6 |
# +------------------------------------------+-----------+-----------------+
答案 1 :(得分:0)
我使用了 udf 函数:
<?xml version="1.0" ?>
<keyboard>
<keybind key="0">
<action name="Execute">
<command>whatToExec 0</command>
</action>
</keybind>
<keybind key="1">
<action name="Execute">
<command>whatToExec 1</command>
</action>
</keybind>
<keybind key="2">
<action name="Execute">
<command>whatToExec 2</command>
</action>
</keybind>
</keyboard>