dask groupby agg加权平均“未知聚合lambda”错误

时间:2019-08-27 08:39:26

标签: python dask

在Dask中,我需要根据第三列,根据两列的分组值来计算加权平均值。我正在这样做:

Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Could not create connection to database server.
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:404)
        at com.mysql.jdbc.Util.getInstance(Util.java:387)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:917)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:896)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:885)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:860)
        at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2330)
        at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2083)
        at com.mysql.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:806)
        at com.mysql.jdbc.JDBC4Connection.<init>(JDBC4Connection.java:47)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:404)
        at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:410)
        at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:328)
        at org.apache.commons.dbcp.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:38)
        at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582)
        at org.apache.commons.dbcp.BasicDataSource.validateConnectionFactory(BasicDataSource.java:1556)
        at org.apache.commons.dbcp.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:1545)
        ... 51 more
Caused by: java.lang.NullPointerException
        at com.mysql.jdbc.ConnectionImpl.getServerCharset(ConnectionImpl.java:2997)
        at com.mysql.jdbc.MysqlIO.sendConnectionAttributes(MysqlIO.java:1936)
        at com.mysql.jdbc.MysqlIO.proceedHandshakeWithPluggableAuthentication(MysqlIO.java:1865)
        at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1228)
        at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2253)
        at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2284)

在熊猫中,我的内存不足。 在达斯克,我得到了:

dask_df = dd.from_pandas(df, npartitions = 10)
wm = lambda x: np.average(x, weights=dask_df.loc[x.index,"C"])
dask_df = dask_df.groupby(['A', 'B']).agg({'C' : 
wm}).reset_index()
output_df = dask_df.compute()

1 个答案:

答案 0 :(得分:0)

您可能对以下定义的自定义聚合感兴趣:https://docs.dask.org/en/latest/dataframe-groupby.html#aggregate

很明显,错误消息可以得到改善。我建议提出一个问题https://github.com/dask/dask/issues/new