数据块:数据框分组汇总,收集器集包含重复值

时间:2019-06-06 02:49:08

标签: scala apache-spark dataframe databricks

假设我有一个数据集df,如下所示

col1   col2 
1      A
1      B
1      C
2      B
2      B
2      C

我想要使用col1进入数据集,并使用以下代码将col2作为数组

var df2=df.groupBy("col1").agg(collect_set("col2").alias("col2"))

那么df2将是

COl1    Col2
1       A,B,C
2       B,C

如何更改代码以便拥有

COl1    Col2
1       A,B,C
2       B,B,C

1 个答案:

答案 0 :(得分:2)

您可以执行java.lang.LinkageError: loader constraint violation: when resolving method "org.slf4j.impl.StaticLoggerBinder.getLoggerFactory()Lorg/slf4j/ILoggerFactory;" the class loader (instance of org/eclipse/osgi/internal/baseadaptor/DefaultClassLoader) of the current class, org/slf4j/LoggerFactory, and the class loader (instance of org/eclipse/osgi/internal/baseadaptor/DefaultClassLoader) for the method's defining class, org/slf4j/impl/StaticLoggerBinder, have different Class objects for the type org/slf4j/ILoggerFactory used in the signature at org.slf4j.LoggerFactory.getILoggerFactory(LoggerFactory.java:429) at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:365) at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:155) at org.apache.commons.logging.impl.SLF4JLogFactory.getInstance(SLF4JLogFactory.java:132) at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:273) at org.apache.commons.beanutils.ConvertUtilsBean.<init>(ConvertUtilsBean.java:154) at org.apache.commons.beanutils.BeanUtilsBean.<init>(BeanUtilsBean.java:113) at org.apache.activemq.artemis.utils.uri.BeanSupport.<clinit>(BeanSupport.java:35) at org.apache.activemq.artemis.uri.schema.serverLocator.AbstractServerLocatorSchema.newConnectionOptions(AbstractServerLocatorSchema.java:29) at org.apache.activemq.artemis.uri.schema.serverLocator.TCPServerLocatorSchema.internalNewObject(TCPServerLocatorSchema.java:42) at org.apache.activemq.artemis.uri.schema.serverLocator.TCPServerLocatorSchema.internalNewObject(TCPServerLocatorSchema.java:33) at org.apache.activemq.artemis.utils.uri.URISchema.newObject(URISchema.java:86) at org.apache.activemq.artemis.utils.uri.URISchema.newObject(URISchema.java:30) at org.apache.activemq.artemis.utils.uri.URIFactory.newObject(URIFactory.java:59) at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.newLocator(ServerLocatorImpl.java:411) at org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory.<init>(ActiveMQConnectionFactory.java:209) at org.apache.activemq.artemis.jms.client.ActiveMQConnectionFactory.<init>(ActiveMQConnectionFactory.java:202) at org.apache.activemq.artemis.jms.client.ActiveMQJMSConnectionFactory.<init>(ActiveMQJMSConnectionFactory.java:34) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.jboss.marshalling.reflect.SerializableClass.invokeConstructor(SerializableClass.java:502) at org.jboss.marshalling.reflect.SerializableClass.callNoArgConstructor(SerializableClass.java:456) at org.jboss.marshalling.river.RiverUnmarshaller.doReadNewObject(RiverUnmarshaller.java:1414) at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:275) at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:208) at org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:41) at org.wildfly.naming.client.remote.RemoteClientTransport.lookup(RemoteClientTransport.java:243) at org.wildfly.naming.client.remote.RemoteContext.lambda$lookupNative$0(RemoteContext.java:190) at org.wildfly.naming.client.NamingProvider.performExceptionAction(NamingProvider.java:222) at org.wildfly.naming.client.remote.RemoteContext.performWithRetry(RemoteContext.java:100) at org.wildfly.naming.client.remote.RemoteContext.lookupNative(RemoteContext.java:188) at org.wildfly.naming.client.AbstractFederatingContext.lookup(AbstractFederatingContext.java:74) at org.wildfly.naming.client.store.RelativeFederatingContext.lookupNative(RelativeFederatingContext.java:58) at org.wildfly.naming.client.AbstractFederatingContext.lookup(AbstractFederatingContext.java:74) at org.wildfly.naming.client.AbstractFederatingContext.lookup(AbstractFederatingContext.java:60) at org.wildfly.naming.client.AbstractFederatingContext.lookup(AbstractFederatingContext.java:66) at org.wildfly.naming.client.WildFlyRootContext.lookup(WildFlyRootContext.java:144) at javax.naming.InitialContext.lookup(InitialContext.java:417) at org.apache.axis2.transport.jms.JMSUtils.lookup(JMSUtils.java:687) at org.apache.axis2.transport.jms.JMSConnectionFactory.initJMSConnectionFactory(JMSConnectionFactory.java:138) at org.apache.axis2.transport.jms.JMSConnectionFactory.<init>(JMSConnectionFactory.java:115) at org.apache.axis2.transport.jms.JMSConnectionFactoryManager.loadConnectionFactoryDefinitions(JMSConnectionFactoryManager.java:61) at org.apache.axis2.transport.jms.JMSConnectionFactoryManager.<init>(JMSConnectionFactoryManager.java:48) at org.apache.axis2.transport.jms.JMSListener.doInit(JMSListener.java:70) at org.apache.axis2.transport.base.AbstractTransportListenerEx.init(AbstractTransportListenerEx.java:62) at org.apache.axis2.engine.ListenerManager.init(ListenerManager.java:84) at org.wso2.carbon.core.init.CarbonServerManager.initializeCarbon(CarbonServerManager.java:411) at org.wso2.carbon.core.init.CarbonServerManager.start(CarbonServerManager.java:219) at org.wso2.carbon.core.internal.CarbonCoreServiceComponent.activate(CarbonCoreServiceComponent.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.eclipse.equinox.internal.ds.model.ServiceComponent.activate(ServiceComponent.java:260) at org.eclipse.equinox.internal.ds.model.ServiceComponentProp.activate(ServiceComponentProp.java:146) at org.eclipse.equinox.internal.ds.model.ServiceComponentProp.build(ServiceComponentProp.java:345) at org.eclipse.equinox.internal.ds.InstanceProcess.buildComponent(InstanceProcess.java:620) at org.eclipse.equinox.internal.ds.InstanceProcess.buildComponents(InstanceProcess.java:197) at org.eclipse.equinox.internal.ds.Resolver.getEligible(Resolver.java:343) at org.eclipse.equinox.internal.ds.SCRManager.serviceChanged(SCRManager.java:222) at org.eclipse.osgi.internal.serviceregistry.FilteredServiceListener.serviceChanged(FilteredServiceListener.java:107) at org.eclipse.osgi.framework.internal.core.BundleContextImpl.dispatchEvent(BundleContextImpl.java:861) at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:230) at org.eclipse.osgi.framework.eventmgr.ListenerQueue.dispatchEventSynchronous(ListenerQueue.java:148) at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.publishServiceEventPrivileged(ServiceRegistry.java:819) at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.publishServiceEvent(ServiceRegistry.java:771) at org.eclipse.osgi.internal.serviceregistry.ServiceRegistrationImpl.register(ServiceRegistrationImpl.java:130) at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.registerService(ServiceRegistry.java:214) at org.eclipse.osgi.framework.internal.core.BundleContextImpl.registerService(BundleContextImpl.java:433) at org.eclipse.equinox.http.servlet.internal.Activator.registerHttpService(Activator.java:81) at org.eclipse.equinox.http.servlet.internal.Activator.addProxyServlet(Activator.java:60) at org.eclipse.equinox.http.servlet.internal.ProxyServlet.init(ProxyServlet.java:40) at org.wso2.carbon.tomcat.ext.servlet.DelegationServlet.init(DelegationServlet.java:38) at org.apache.catalina.core.StandardWrapper.initServlet(StandardWrapper.java:1269) at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1182) at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:1072) at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:5362) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5660) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:145) at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1700) at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1690) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 而不是collect_list,因为collect_set返回collect_set

a set of objects with duplicate elements eliminate

谢谢。