我已经编写了以下代码来分组和聚合列
val gmList = List("gc1","gc2","gc3")
val aList = List("val1","val2","val3","val4","val5")
val cype = "first"
val exprs = aList.map((_ -> cype )).toMap
dfgroupBy(gmList.map (col): _*).agg (exprs).show
但这会创建一个在所有列中附加聚合名称的列,如下所示
所以我想给名字first(val1)-> val1加上别名,我想使这段代码成为exprs的一部分通用
+----------+----------+-------------+-------------------------+------------------+---------------------------+------------------------+-------------------+
| gc1 | gc2 | gc3 | first(val1) | first(val2)| first(val3) | first(val4) | first(val5) |
+----------+----------+-------------+-------------------------+------------------+---------------------------+------------------------+-------------------+
答案 0 :(得分:1)
您可以略微更改生成表达式的方式,并在其中使用函数alias
:
import org.apache.spark.sql.functions.col
val aList = List("val1","val2","val3","val4","val5")
val exprs = aList.map(c => first(col(c)).alias(c) )
dfgroupBy( gmList.map(col) : _*).agg(exprs.head , exprs.tail: _*).show
答案 1 :(得分:1)
一种方法是在随后的 27-10-2018 18:37:08 : cache.CacheBeanPostProcessor , postProcessBeanDefinitionRegistry start
27-10-2018 18:37:08 : cache.CacheBeanPostProcessor , postProcessBeanFactory
Error |
java.lang.RuntimeException: Reloading agent exited via exception, please raise a jira
Error |
at org.springsource.loaded.agent.ClassPreProcessorAgentAdapter.transform(ClassPreProcessorAgentAdapter.java:110)
Error |
at sun.instrument.TransformerManager.transform(TransformerManager.java:188)
Error |
at sun.instrument.InstrumentationImpl.transform(InstrumentationImpl.java:428)
Error |
at sun.misc.Unsafe.defineAnonymousClass(Native Method)
Error |
at java.lang.invoke.InnerClassLambdaMetafactory.spinInnerClass(InnerClassLambdaMetafactory.java:326)
Error |
at java.lang.invoke.InnerClassLambdaMetafactory.buildCallSite(InnerClassLambdaMetafactory.java:194)
Error |
at java.lang.invoke.LambdaMetafactory.metafactory(LambdaMetafactory.java:304)
Error |
at java.lang.invoke.CallSite.makeSite(CallSite.java:302)
Error |
at java.lang.invoke.MethodHandleNatives.linkCallSiteImpl(MethodHandleNatives.java:307)
Error |
at java.lang.invoke.MethodHandleNatives.linkCallSite(MethodHandleNatives.java:297)
Error |
at com.mysql.cj.conf.ConnectionUrl.buildConnectionStringCacheKey(ConnectionUrl.java:247)
Error |
at com.mysql.cj.conf.ConnectionUrl.getConnectionUrlInstance(ConnectionUrl.java:186)
Error |
at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:204)
Error |
at org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:278)
Error |
at org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:182)
Error |
at org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:701)
Error |
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:635)
Error |
at org.apache.tomcat.jdbc.pool.ConnectionPool.init(ConnectionPool.java:486)
Error |
at org.apache.tomcat.jdbc.pool.ConnectionPool.<init>(ConnectionPool.java:144)
Error |
at org.apache.tomcat.jdbc.pool.DataSourceProxy.pCreatePool(DataSourceProxy.java:116)
Error |
at org.apache.tomcat.jdbc.pool.DataSourceProxy.createPool(DataSourceProxy.java:103)
Error |
at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:127)
Error |
at org.springframework.jdbc.datasource.LazyConnectionDataSourceProxy.afterPropertiesSet(LazyConnectionDataSourceProxy.java:162)
Error |
at org.springframework.jdbc.datasource.LazyConnectionDataSourceProxy.<init>(LazyConnectionDataSourceProxy.java:106)
中将聚合列别名为原始列名称。我还建议将单个聚合函数(即select
)推广为函数列表,如下所示:
first
答案 2 :(得分:0)
这是一个更通用的版本,可以与任何聚合函数一起使用,并且不需要预先命名聚合列。像往常一样建立分组的df
,然后使用:
val colRegex = raw"^.+\((.*?)\)".r
val newCols = df.columns.map(c => col(c).as(colRegex.replaceAllIn(c, m => m.group(1))))
df.select(newCols: _*)
这将仅提取括号内的内容,而不管调用什么聚合函数(例如first(val) -> val
,sum(val) -> val
,count(val) -> val
等)。