汇总到序列

时间:2018-10-15 18:38:01

标签: scala apache-spark-sql

假设我有一个configurations { //second jar's configuration addons } dependencies { .... // sample dependency addons group: 'org.apache.logging.log4j', name: 'log4j-api', version: '2.11.1' } task customJar(type: org.springframework.boot.gradle.tasks.bundling.BootJar){ baseName = 'custom-spring-boot' version = '0.1.0' mainClassName = 'hello.Application' from { // this is your second jar's configuration configurations.addons.collect { it.isDirectory() ? it : zipTree(it) } } with bootJar } // add a dependency to create both jars with gradle bootJar Command bootJar.dependsOn customJar 的模式:

org.apache.spark.sql.DataFrame

假设此DataFrame的典型实例如下所示:

root
 |-- origin: string (nullable = true)
 |-- destination: string (nullable = true)

我想将其转换为这样的DataFrame:

+------------------------------+--------------------------------------+
|origin                        |destination                           |
+------------------------------+--------------------------------------+
|JEBEL ALI                     |KUWAIT                                |
|CHITTAGONG                    |KEARNY POINT                          |
|FELIXSTOWE                    |KEARNY POINT                          |
|LOS ANGELES                   |EUROPOORT - E.C.T. DELTA TERMINAL     |
|LOS ANGELES                   |KAOHSIUNG                             |
|GREATER NEW YORK TERMINAL     |ANTWERP                               |
|SHANGHAI                      |LOS ANGELES                           |
|SAN PEDRO                     |BRANI TERMINAL - PULAU BRANI          |
|KAMPONG SAOM                  |HOWLAND HOOK CONTAINER TERMINAL       |
|SHANGHAI                      |LONG BEACH                            |
|BARCELONA                     |MONTREAL                              |
|HAIFA                         |GREATER NEW YORK TERMINAL             |
|BRANI TERMINAL - PULAU BRANI  |BUSAN                                 |
|MUMBAI                        |KEARNY POINT                          |
|LAEM CHABANG                  |CAT LAI OIL TERMINAL - HO CHI MIN CITY|
|BARCELONA                     |JAWAHARLAL NEHRU PORT                 |
|HUANG DAO - OIL TERMINAL NO. 2|VANCOUVER, B.C.                       |
|HAIFA                         |HALIFAX                               |
|BRANI TERMINAL - PULAU BRANI  |LOS ANGELES                           |
|MANILA                        |VANCOUVER, B.C.                       |
+------------------------------+--------------------------------------+  

请注意,+------------------------------+---------------------------------------------------+ |origin |destinations | +------------------------------+---------------------------------------------------+ |JEBEL ALI |[KUWAIT] | |CHITTAGONG |[KEARNY POINT] | |FELIXSTOWE |[KEARNY POINT] | |LOS ANGELES |[EUROPOORT - E.C.T. DELTA TERMINAL, KAOHSIUNG] | |GREATER NEW YORK TERMINAL |[ANTWERP] | |SHANGHAI |[LOS ANGELES, [LONG BEACH] | |SAN PEDRO |BRANI TERMINAL - PULAU BRANI | |KAMPONG SAOM |HOWLAND HOOK CONTAINER TERMINAL | |BARCELONA |[MONTREAL, JAWAHARLAL NEHRU PORT] | |HAIFA |[GREATER NEW YORK TERMINAL, HALIFAX] | |BRANI TERMINAL - PULAU BRANI |[BUSAN, LOS ANGELES] | |MUMBAI |KEARNY POINT | |LAEM CHABANG |CAT LAI OIL TERMINAL - HO CHI MIN CITY | |HUANG DAO - OIL TERMINAL NO. 2|VANCOUVER, B.C. | |MANILA |VANCOUVER, B.C. | +------------------------------+---------------------------------------------------+ 的每个值都是唯一的,并显示与该起点关联的所有目的地。 origin的类型为Seq [String]。

我该怎么做?

1 个答案:

答案 0 :(得分:1)

val originToDestinations = originDestinationDf.groupBy("origin").agg(collect_set("destination"))