如何"乘以" python pandas dataframes(好像它们是向量)?

时间:2015-07-22 12:04:01

标签: python matrix pandas multiplication

我正在学习大熊猫。我有两个数据帧:

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;

import org.junit.Test;

import com.zaxxer.hikari.HikariConfig;
import com.zaxxer.hikari.HikariDataSource;

public class HikaryAutoCloseTest {
    private static HikariDataSource configureDataSource() {
        try {
            Class.forName("org.postgresql.Driver");
        } catch (ClassNotFoundException e) {
            throw new RuntimeException(e);
        }

        HikariConfig config = new HikariConfig();
        config.setJdbcUrl("jdbc:postgresql://127.0.0.1/DATABASE");
        config.setUsername("USERNAME");
        config.setPassword("PASSWORD");

        config.setLeakDetectionThreshold(10000);

        config.addDataSourceProperty("cachePrepStmts", "true");
        config.addDataSourceProperty("useServerPrepStmts", "true");

        return new HikariDataSource(config);
    }

    @Test
    public void testHikaryAutoClose() {
        HikariDataSource dataSource = configureDataSource();

        boolean ret = shouldNotLeakConnection(dataSource);
        if (ret) {
            System.out.println("UPDATE okey");
        }

        /* Wait for LeakTask to complain */
        try {
            Thread.sleep(20000);
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
        System.out.println("Exiting");
    }

    private boolean shouldNotLeakConnection(HikariDataSource dataSource) {
        String sql = "INSERT INTO error_logs (description) values (?)";

        try (Connection conn = dataSource.getConnection(); PreparedStatement stmt = conn.prepareStatement(sql);) {
            stmt.setString(1, "description");
            return stmt.executeUpdate() != 0; // minor changes to this line remove the leak
        } catch (SQLException e) {
            throw new RuntimeException(e);
        }
    }
}

我想将它们相乘(就像我可能用向量来获得矩阵一样)。答案应该是:

private boolean shouldNotLeakConnection(HikariDataSource dataSource) {
    String sql = "INSERT INTO error_logs (description) values (?)";

    try (Connection conn = dataSource.getConnection(); PreparedStatement stmt = conn.prepareStatement(sql);) {
        stmt.setString(1, "description");
        boolean ret = stmt.executeUpdate() != 0;
        return ret;
    } catch (SQLException e) {
        throw new RuntimeException(e);
    }
}

我怎样才能做到这一点?

2 个答案:

答案 0 :(得分:2)

它不是最漂亮的,但它会起作用:

>>> df1["dummy"] = 1
>>> df2["dummy"] = 1
>>> dfm = df1.merge(df2, on="dummy")
>>> dfm["value"] = dfm.pop("value_x") * dfm.pop("value_y")
>>> del dfm["dummy"]
>>> dfm
  quality1 quality2  value
0        A        D      1
1        A        E     10
2        A        F    100
3        B        D      2
4        B        E     20
5        B        F    200
6        C        D      3
7        C        E     30
8        C        F    300

在我们获得笛卡尔联接的原生支持(口哨并远视...... )之前,合并虚拟列是获得相同效果的简单方法。中间框架看起来像

>>> dfm
  quality1  value_x  dummy quality2  value_y
0        A        1      1        D        1
1        A        1      1        E       10
2        A        1      1        F      100
3        B        2      1        D        1
4        B        2      1        E       10
5        B        2      1        F      100
6        C        3      1        D        1
7        C        3      1        E       10
8        C        3      1        F      100

答案 1 :(得分:2)

您还可以使用cartesian中的scikit-learn功能:

from sklearn.utils.extmath import cartesian

# Your data:
df1 = pd.DataFrame({'quality1':list('ABC'), 'value':[1,2,3]})
df2 = pd.DataFrame({'quality2':list('DEF'), 'value':[1,10,100]})

# Make the matrix of labels:
dfm = pd.DataFrame(cartesian((df1.quality1.values, df2.quality2.values)), 
                   columns=['quality1', 'quality2'])

# Multiply values:
dfm['value'] = df1.value.values.repeat(df2.value.size) * pd.np.tile(df2.value.values, df1.value.size)

print dfm.set_index(['quality1', 'quality2'])

哪个收益率:

                   value
quality1 quality2       
A        D             1
         E            10
         F           100
B        D             2
         E            20
         F           200
C        D             3
         E            30
         F           300