我的数据框架是
scala> x.printSchema()
root
|-- pangaea_customer_id: string (nullable = true)
|-- persona_model: map (nullable = true)
| |-- key: string
| |-- value: struct (valueContainsNull = true)
| | |-- score: double (nullable = true)
| | |-- tag: string (nullable = true)
|-- process_date: string (nullable = true)
以下是此数据库的示例行:
x.show(1)
+--------------------+--------------------+-------------+
| pangaea_customer_id| persona_model| process_date|
+--------------------+--------------------+-------------+
|000000E91010441BB...|Map(Tech -> [0.21...|2018-05-16-01|
+--------------------+--------------------+-------------+
我想创建一个新的数据框,其中包含2个x.pangaea_customer_id
的coloums及其各自的分数(位于地图内)。
以下是我到目前为止所尝试的内容,我正在使用此命令:
val newDF = oldDF.select(col("pangaea_customer_id"), col("persona_model")("Tech")("score"))
但是这只给出得分的值,其关键是" Tech",我想要所有客户的所有得分值,我应该替换什么" Tech"与?
我的输出在这里,
scala> newDF.show(10,false)
+--------------------------------+-------------------------+
|pangaea_customer_id |persona_model[Tech].score|
+--------------------------------+-------------------------+
|000000E91010441BB122402A45D439E7|0.21678 |
|000000FB2B304F60B244FEAFDE932640|null |
|000003E2565A4C88B9DAADDE5B5ADE71|null |
|000009D9D1B3443E95F21C58D708B196|null |
|000009EB8F6C4BFABA730726DCFE1FEE|null |
|0000119D3561461E96F8BA2B9523579A|null |
|00001296DC394AED93A19BBD79A5533C|null |
|000014D91E6D4A44AA98E0118E349A52|null |
|0000156A2B5D4275980AB9FD4F8C9163|null |
|000015EC31FC426E9A5477FE0A857982|1.23 |
+--------------------------------+-------------------------+
它显示了所有那些在地图上的关键是" tech"这是有道理的,因为我输入了" tech"在我上面的命令也。但我想要所有分数而不是空值。