经过一些转换后,我得到了具有以下格式的rdd:
[(0, [('a', 1), ('b', 1), ('b', 1), ('b', 1)])
(1, [('c', 1), ('d', 1), ('h', 1), ('h', 1)])]
我不知道如何在此rdd的值部分上实质上“ reduceByKey()”。
这是我想要实现的:
[(0, [('a', 1), ('b', 3)])
(1, [('c', 1), ('d', 1), ('h', 2)])]
我最初使用.values(),然后将reduceByKey应用于结果,但是最终我丢失了原始密钥(在这种情况下为0或1)。
答案 0 :(得分:1)
您丢失了原始密钥,因为org.ajax4jsf.exception.FileUploadException: IO Error parsing multipart request
at org.ajax4jsf.request.MultipartRequest.parseRequest(MultipartRequest.java:388)
at org.richfaces.component.FileUploadPhaselistener.beforePhase(FileUploadPhaselistener.java:63)
at com.sun.faces.lifecycle.Phase.handleBeforePhase(Phase.java:201)
at com.sun.faces.lifecycle.Phase.doPhase(Phase.java:74)
at com.sun.faces.lifecycle.RestoreViewPhase.doPhase(RestoreViewPhase.java:109)
at com.sun.faces.lifecycle.LifecycleImpl.execute(LifecycleImpl.java:177)
at javax.faces.webapp.FacesServlet.executeLifecyle(FacesServlet.java:707)
at javax.faces.webapp.FacesServlet.service(FacesServlet.java:451)
at org.apache.catalina.core.StandardWrapper.service(StandardWrapper.java:1628)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:339)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:209)
at org.ajax4jsf.webapp.BaseXMLFilter.doXmlFilter(BaseXMLFilter.java:206)
at org.ajax4jsf.webapp.BaseFilter.handleRequest(BaseFilter.java:290)
at org.ajax4jsf.webapp.BaseFilter.processUploadsAndHandleRequest(BaseFilter.java:367)
at org.ajax4jsf.webapp.BaseFilter.doFilter(BaseFilter.java:515)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:251)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:209)
at org.apache.catalina.core.ApplicationDispatcher.doInvoke(ApplicationDispatcher.java:822)
at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:688)
at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:527)
at org.apache.catalina.core.ApplicationDispatcher.doDispatch(ApplicationDispatcher.java:496)
at org.apache.catalina.core.ApplicationDispatcher.dispatch(ApplicationDispatcher.java:378)
at org.apache.catalina.core.StandardHostValve.custom(StandardHostValve.java:507)
at org.apache.catalina.core.StandardHostValve.dispatchToErrorPage(StandardHostValve.java:701)
at org.apache.catalina.core.StandardHostValve.status(StandardHostValve.java:385)
at org.apache.catalina.core.StandardHostValve.throwable(StandardHostValve.java:319)
at org.apache.catalina.core.StandardHostValve.postInvoke(StandardHostValve.java:217)
at org.apache.catalina.connector.CoyoteAdapter.doService(CoyoteAdapter.java:373)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:238)
at com.sun.enterprise.v3.services.impl.ContainerMapper$HttpHandlerCallable.call(ContainerMapper.java:520)
at com.sun.enterprise.v3.services.impl.ContainerMapper.service(ContainerMapper.java:217)
at org.glassfish.grizzly.http.server.HttpHandler.runService(HttpHandler.java:182)
at org.glassfish.grizzly.http.server.HttpHandler.doHandle(HttpHandler.java:156)
at org.glassfish.grizzly.http.server.HttpServerFilter.handleRead(HttpServerFilter.java:218)
at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:95)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:260)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:177)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:109)
at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:88)
at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:53)
at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:524)
at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:89)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.run0(WorkerThreadIOStrategy.java:94)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy.access$100(WorkerThreadIOStrategy.java:33)
at org.glassfish.grizzly.strategies.WorkerThreadIOStrategy$WorkerThreadRunnable.run(WorkerThreadIOStrategy.java:114)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569)
at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Request prolog cannot be read
at org.ajax4jsf.request.MultipartRequest.readProlog(MultipartRequest.java:333)
at org.ajax4jsf.request.MultipartRequest.initialize(MultipartRequest.java:369)
at org.ajax4jsf.request.MultipartRequest.parseRequest(MultipartRequest.java:379)
... 47 more¡ëôwÍ$
将仅连续获得.values()
的值。您应该对行中的元组求和。
key-value
答案 1 :(得分:0)
尽管values
给出了RDD,但是reduceByKey
可以对RDD上的所有值进行逐行操作。
您也可以使用groupby
(需要订购)来实现相同的目的:
from itertools import groupby
distdata.map(lambda x: (x[0], [(a, sum(c[1] for c in b)) for a,b in groupby(sorted(x[1]), key=lambda p: p[0]) ])).collect()