Pandas:根据值连接列名

时间:2017-07-03 05:53:56

标签: python-3.x pandas

我有一张表格如下:

 INFO: Starting Servlet Engine: Apache Tomcat/9.0.0.M21
 Jul 03, 2017 11:12:24 AM org.apache.catalina.core.ApplicationContext log
  INFO: Marking servlet [helloworld] as unavailable
 Jul 03, 2017 11:12:24 AM org.apache.catalina.core.StandardContext 
 loadOnStartup
 SEVERE: Servlet [helloworld] in web application [/modaltestapp-api] threw 
  load() exception
 java.lang.ClassNotFoundException: 
 com.sun.jersey.spi.container.servlet.ServletContainer
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1275)
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1109)
at org.apache.catalina.core.DefaultInstanceManager.loadClass(DefaultInstanceManager.java:508)
at org.apache.catalina.core.DefaultInstanceManager.loadClassMaybePrivileged(DefaultInstanceManager.java:489)
at org.apache.catalina.core.DefaultInstanceManager.newInstance(DefaultInstanceManager.java:119)
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1050)
at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:989)
at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:4921)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5231)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1439)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1429)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.tomcat.util.threads.InlineExecutorService.execute(InlineExecutorService.java:75)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:953)
at org.apache.catalina.core.StandardHost.startInternal(StandardHost.java:872)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1439)
at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1429)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.tomcat.util.threads.InlineExecutorService.execute(InlineExecutorService.java:75)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
at org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:953)
at org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:262)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
at org.apache.catalina.core.StandardService.startInternal(StandardService.java:422)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
at org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:793)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:183)
at org.apache.catalina.startup.Catalina.start(Catalina.java:656)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:355)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:495)

Jul 03, 2017 11:12:24 AM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["http-nio-8080"]
Jul 03, 2017 11:12:24 AM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler ["ajp-nio-8009"]
Jul 03, 2017 11:12:24 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 317 ms

我想创建一个列f,只有当行的值为1时才是列名的串联。

    a   b   c   d   e
r1  0   1   0   1   0
r2  1   1   0   0   0

有没有人知道如何用熊猫做到这一点?非常感谢任何帮助!

1 个答案:

答案 0 :(得分:2)

您可以在mul之前按df使用多个列名称,然后sum - 但在df01值是转换为TrueFalse s:

df['f'] = df.mul(df.columns.values).sum(axis=1)
print (df)
    a  b  c  d  e   f
r1  0  1  0  1  0  bd
r2  1  1  0  0  0  ab

常规解决方案(如果更多可能的值为01):

添加eq以与1进行比较:

df['f'] = df.eq(1).mul(df.columns.values).sum(axis=1)
print (df)
    a  b  c  d  e   f
r1  0  1  0  1  0  bd
r2  1  1  0  0  0  ab

doteq的解决方案:

df['f'] = df.eq(1).dot(df.columns.values)
print (df)
    a  b  c  d  e   f
r1  0  1  0  1  0  bd
r2  1  1  0  0  0  ab

apply的另一个解决方案更慢:

df['f'] = df.apply(lambda x: ''.join(x.index[x == 1]), axis=1)
print (df)
    a  b  c  d  e   f
r1  0  1  0  1  0  bd
r2  1  1  0  0  0  ab

编辑:

为了增加空间,可以使用:

df['f'] = df['f'].apply(lambda x: ' '.join(list(x)))
print (df)
    a  b  c  d  e    f
r1  0  1  0  1  0  b d
r2  1  1  0  0  0  a b