如何在计算大numpy数组的逆累积分布函数时避免numpy.place中的bug?

时间:2016-02-07 23:29:17

标签: python numpy scipy

我可能遇到过scipy或numpy的错误,有人看过以下问题或有一个很好的解决方法吗?

type Redirect struct {
    ID        string
    URL       string
    CreatedAt time.Time
}

func FindByID(db *sql.DB, id string) (*Redirect, error) {
    var redirect Redirect

    err := db.QueryRow(
        `SELECT "id", "url", "created_at" FROM "redirect" WHERE "id" = $1`, id).
        Scan(&redirect.ID, &redirect.URL, &redirect.CreatedAt)

    switch {
    case err == sql.ErrNoRows:
        return nil, nil
    case err != nil:
        return nil, err
    }

    return &redirect, nil
}

func Save(db *sql.DB, redirect *Redirect) error {
    redirect.CreatedAt = time.Now()

    _, err := db.Exec(
        `INSERT INTO "redirect" ("id", "url", "created_at") VALUES ($1, $2, $3)`,
        redirect.ID, redirect.URL, redirect.CreatedAt)

    return err
}

结果

from scipy.stats import distributions
import numpy as np
distributions.norm.ppf(np.ones((30000, 10000)) / 2.0)

较小的运行(如20000行)工作正常。

使用numpy 1.10.4。

修改

问题似乎更深入,出现在numpy内部:

array([[  0.,   0.,   0., ...,   0.,   0.,   0.],
       [  0.,   0.,   0., ...,   0.,   0.,   0.],
       [  0.,   0.,   0., ...,   0.,   0.,   0.],
       ..., 
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan],
       [ nan,  nan,  nan, ...,  nan,  nan,  nan]])

导致

na = np.zeros((30000, 10000)) * np.nan
np.place(na, np.ones((30000, 10000)), np.ravel(np.ones((30000, 10000))))

添加了错误报告:https://github.com/numpy/numpy/issues/7207

1 个答案:

答案 0 :(得分:2)

问题似乎是numpy/core/src/multiarray/compiled_base.carr_insert_loop函数内发生整数溢出。我打开了pull request来解决这个问题。