Question

我正在尝试运行我的python程序，看来它应该可以平稳运行，但是我遇到了一个我从未见过的错误：

free(): invalid pointer
Aborted (core dumped)

但是，我不确定如何尝试解决错误，因为它不能给我太多有关问题本身的信息。

起初，我认为网络中张量的大小应该是一个问题，但是它们完全可以。我已经用Google搜索了一下这个问题，发现我可以发现这是我不应该分配内存的问题，但是我不知道该如何解决

我的代码分为两个不同的文件，并且我使用两个库来使用Sinkhorn损失函数并使样本随机成为网格。

import argparse
import point_cloud_utils as pcu
import time

import numpy as np
import torch
import torch.nn as nn
from fml.nn import SinkhornLoss

import common
def main():
    # x is a tensor of shape [n, 3] containing the positions of the vertices that
    x = torch._C.from_numpy(common.loadpointcloud("sphere.txt"))
    # t is a tensor of shape [n, 3] containing a set of nicely distributed samples in the unit cube
    v, f = common.unit_cube()
    t = torch._C.sample_mesh_lloyd(pcu.lloyd(v,f,x.shape[0]).astype(np.float32)) # sample randomly a point cloud (cube for now?)

    # The model is a simple fully connected network mapping a 3D parameter point to 3D
    phi = common.MLP(in_dim=3, out_dim=3)

    # Eps is 1/lambda and max_iters is the maximum number of Sinkhorn iterations to do
    emd_loss_fun = SinkhornLoss(eps=1e-3, max_iters=20,
                                stop_thresh=1e-3, return_transport_matrix=True)

    mse_loss_fun = torch.nn.MSELoss()

    # Adam optimizer at first
    optimizer = torch.optim.Adam(phi.parameters(), lr= 10e-3)

    fit_start_time = time.time()

    for epoch in range(100):
        optimizer.zero_grad()

        # Do the forward pass of the neural net, evaluating the function at the parametric points
        y = phi(t)

        # Compute the Sinkhorn divergence between the reconstruction*(using the francis library) and the target
        # NOTE: The Sinkhorn function expects a batch of b point sets (i.e. tensors of shape [b, n, 3])
        # since we only have 1, we unsqueeze so x and y have dimension [1, n, 3]
        with torch.no_grad():
            _, P = emd_loss_fun(phi(t).unsqueeze(0), x.unsqueeze(0))

        # Project the transport matrix onto the space of permutation matrices and compute the L-2 loss
        # between the permuted points
        loss = mse_loss_fun(y[P.squeeze().max(0)[1], :], x)
        # loss = mse_loss_fun(P.squeeze() @ y,  x)  # Use the transport matrix directly

        # Take an optimizer step
        loss.backward()
        optimizer.step()

        print("Epoch %d, loss = %f" % (epoch, loss.item()))

    fit_end_time = time.time()

    print("Total time = %f" % (fit_end_time - fit_start_time))
    # Plot the ground truth, reconstructed points, and a mesh representing the fitted function, phi
    common.visualitation(x,t,phi)



if __name__ == "__main__":
    main()

错误消息是： free（）：无效的指针中止（核心已弃用）

那对我没有太大帮助。如果有人知道发生了什么或者您对这个错误有更多的了解，我将不胜感激。

Answer 1

给以后的读者的注意：此错误已归档为issue #21018。

这不是您的Python代码中的问题。这是PyTorch（可能）或Python本身（不太可能，但可能）中的错误。

free(3)是C函数，当不再需要动态分配的内存时，它将释放该内存。您不能（轻松）从Python调用它，因为内存管理是Python解释器通常处理的低级实现细节。但是，您还将使用用C ++和C编写的PyTorch，它确实具有直接分配和释放内存的功能。

在这种情况下，某些C代码试图释放一个内存块，但是它最初试图释放的内存块并未动态分配，这是一个错误。您应该report this behavior to the PyTorch developers。包括尽可能多的细节，包括可以找到重现问题的最短代码，以及该程序的完整输出。

Answer 2

编辑：原因实际上是已知的。推荐的解决方案是从源代码构建两个软件包。

同时导入open3d和PyTorch存在一个已知问题。原因未知。 https://github.com/pytorch/pytorch/issues/19739

存在一些可能的解决方法：

（1）有些人发现，虽然在我的个人测试中，两种方法都崩溃了，但是更改两个软件包的导入顺序可以解决该问题。

（2）其他人发现将这两个软件包从源头编译为帮助。

（3）还有其他人发现，从单独的脚本中调用open3d和PyTorch可以解决此问题。

free（）：无效的指针异常终止（核心已转储）

2 个答案: