熊猫将功能应用于列的子对象以创建新列

时间:2018-07-05 13:05:16

标签: python pandas dataframe

我已经定义了一个熊猫数据框:

lug 05, 2018 2:50:55 PM org.apache.tomcat.util.digester.SetPropertiesRule begin
AVVERTENZA: [SetPropertiesRule]{Server/Service/Engine/Host/Context} Setting property 'source' to 'org.eclipse.jst.jee.server:WebApplication' did not find a matching property.
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.VersionLoggerListener log
INFORMAZIONI: Server version:        Apache Tomcat/9.0.8
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.VersionLoggerListener log
INFORMAZIONI: Server built:          Apr 27 2018 19:32:00 UTC
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.VersionLoggerListener log
INFORMAZIONI: Server number:         9.0.8.0
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.VersionLoggerListener log
INFORMAZIONI: OS Name:               Mac OS X
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.VersionLoggerListener log
INFORMAZIONI: OS Version:            10.13.5
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.VersionLoggerListener log
INFORMAZIONI: Architecture:          x86_64
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.VersionLoggerListener log
INFORMAZIONI: Java Home:             /Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.VersionLoggerListener log
INFORMAZIONI: JVM Version:           1.8.0_161-b12
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.VersionLoggerListener log
INFORMAZIONI: JVM Vendor:            Oracle Corporation
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.VersionLoggerListener log
INFORMAZIONI: CATALINA_BASE:         /Users/albertomiceli/eclipse/.metadata/.plugins/org.eclipse.wst.server.core/tmp0
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.VersionLoggerListener log
INFORMAZIONI: CATALINA_HOME:         /Library/Tomcat
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.VersionLoggerListener log
INFORMAZIONI: Command line argument: -Dcatalina.base=/Users/albertomiceli/eclipse/.metadata/.plugins/org.eclipse.wst.server.core/tmp0
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.VersionLoggerListener log
INFORMAZIONI: Command line argument: -Dcatalina.home=/Library/Tomcat
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.VersionLoggerListener log
INFORMAZIONI: Command line argument: -Dwtp.deploy=/Users/albertomiceli/eclipse/.metadata/.plugins/org.eclipse.wst.server.core/tmp0/wtpwebapps
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.VersionLoggerListener log
INFORMAZIONI: Command line argument: -Djava.endorsed.dirs=/Library/Tomcat/endorsed
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.VersionLoggerListener log
INFORMAZIONI: Command line argument: -Dfile.encoding=UTF-8
lug 05, 2018 2:50:55 PM org.apache.catalina.core.AprLifecycleListener lifecycleEvent
INFORMAZIONI: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: [/Users/albertomiceli/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.]
lug 05, 2018 2:50:55 PM org.apache.coyote.AbstractProtocol init
INFORMAZIONI: Initializing ProtocolHandler ["http-nio-8080"]
lug 05, 2018 2:50:55 PM org.apache.tomcat.util.net.NioSelectorPool getSharedSelector
INFORMAZIONI: Using a shared selector for servlet write/read
lug 05, 2018 2:50:55 PM org.apache.coyote.AbstractProtocol init
INFORMAZIONI: Initializing ProtocolHandler ["ajp-nio-8009"]
lug 05, 2018 2:50:55 PM org.apache.tomcat.util.net.NioSelectorPool getSharedSelector
INFORMAZIONI: Using a shared selector for servlet write/read
lug 05, 2018 2:50:55 PM org.apache.catalina.startup.Catalina load
INFORMAZIONI: Initialization processed in 737 ms
lug 05, 2018 2:50:55 PM org.apache.catalina.core.StandardService startInternal
INFORMAZIONI: Starting service [Catalina]
lug 05, 2018 2:50:55 PM org.apache.catalina.core.StandardEngine startInternal
INFORMAZIONI: Starting Servlet Engine: Apache Tomcat/9.0.8
lug 05, 2018 2:50:58 PM org.apache.jasper.servlet.TldScanner scanJars
INFORMAZIONI: At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
lug 05, 2018 2:51:01 PM org.apache.jasper.servlet.TldScanner scanJars
INFORMAZIONI: At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time.
lug 05, 2018 2:51:01 PM org.apache.coyote.AbstractProtocol start
INFORMAZIONI: Starting ProtocolHandler ["http-nio-8080"]
lug 05, 2018 2:51:01 PM org.apache.coyote.AbstractProtocol start
INFORMAZIONI: Starting ProtocolHandler ["ajp-nio-8009"]
lug 05, 2018 2:51:01 PM org.apache.catalina.startup.Catalina start
INFORMAZIONI: Server startup in 5398 ms

我想创建一个新列C,该列将为每个值df.A = i计算B相应元素的过滤器scipy.signal.savgol_filter,即df.loc [df.A == i]的过滤器.B代表i = 1,2,3 ...

我使用以下代码:

df = pd.DataFrame( {
   'A': [1,1,1,1,2,2,2,3,3],
   'B': [5,4,8,6,5,6,6,7,7])
df
    A  B
0   1  5
1   1  4
2   1  8
3   1  6
4   2  5
5   2  6
6   2  6
7   3  7
8   3  7

这不会创建列“ C”并给我消息:

for i in df.A.unique() : 
    df.loc[df.A==i]['C']=scipy.signal.savgol_filter(df.loc[df.A==i].B, 3, 1)

我已经阅读了文档,但是没有找到定义新列的正确方法。我应该用什么方法来做?

谢谢您的帮助。

说明:

此问题与scipy.signal.savgol_filter函数无关,并且与使用df.B的N个元素创建要放入df.C的N个其他元素(例如执行fft的其他任何函数)相同。 df.loc [df.A == i] .B for i = 1,2,3 ...

2 个答案:

答案 0 :(得分:3)

它叫做chaining indexing,更好的是:

for i in df.A.unique() : 
    df.loc[df.A==i, 'C']=scipy.signal.savgol_filter(df.loc[df.A==i, 'B'], 3, 1)

但是最好使用GroupBy.transform

import scipy.signal

#added last row to sample for avoid error
df = pd.DataFrame( {
   'A': [1,1,1,1,2,2,2,3,3,3],
   'B': [5,4,8,6,5,6,6,7,7,5]})
#print (df)

df['C'] = df.groupby('A')['B'].transform(lambda x: scipy.signal.savgol_filter(x, 3, 1))    
print (df)
   A  B         C
0  1  5  4.166667
1  1  4  5.666667
2  1  8  6.000000
3  1  6  7.000000
4  2  5  5.166667
5  2  6  5.666667
6  2  6  6.166667
7  3  7  7.333333
8  3  7  6.333333
9  3  5  5.333333  

答案 1 :(得分:1)

代替

df.loc[df.A==i]['C']

使用

df.loc[df.A==i, 'C']

通过df.loc[df.A==i]['C'],您实际上是在更改df的副本,而不是原定的原始副本