Question

我得到了一个大约80.000点（x，y，z）的数据集，而这些点不规则地分布在[0，a] x [0，b]平面的（x，y）\和每个点上（x，y）物理量z取一定值。为了进一步评估我希望在网格上插入数据的数据。

在我使用scipy.interpolate.griddata在常规的二次2D网格上成功插值之前。然而，这种规则网格的缺点在于它不能对z中的区域进行适当的急剧变化建模，而在区域中存在许多数据点，而z的变化很小。

我希望有一个非线性（最好是二次，但具有可变网格尺寸）网格，在z中发生剧烈变化的区域中有更多网格点，在z中稍微变化的区域中有更少数据点。

Answer 1

我认为你有倒退：你的网格可以尽可能规则，但每个网格点应使用相同数量的采样点进行评估，从而允许在高样本密度的区域中进行强烈的梯度变化，并实现平滑在数据稀疏区域。

我使用反距离加权树。我在python中浮动的实现：导入numpy为np 来自scipy.spatial import cKDTree

class invdisttree(object):
    """
    Compute the score of query points based on the scores of their k-nearest neighbours,
    weighted by the inverse of their distances.

    @reference:
    https://en.wikipedia.org/wiki/Inverse_distance_weighting

    Example:
    --------

    import numpy as np
    import matplotlib.pyplot as plt
    from invdisttree import invdisttree

    import matplotlib.pyplot as plt

    # create sample points with structured scores
    X1 = 10 * np.random.rand(1000, 2) -5

    def func(x, y):
        return np.sin(x**2 + y**2) / (x**2 + y**2)

    z1 = func(X1[:,0], X1[:,1])

    # 'train'
    tree = invdisttree(X1, z1)

    # 'test'
    spacing = np.linspace(-5., 5., 100)
    X2 = np.meshgrid(spacing, spacing)
    grid_shape = X2[0].shape
    X2 = np.reshape(X2, (2, -1)).T
    z2 = tree(X2)

    fig, (ax1, ax2, ax3) = plt.subplots(1,3, sharex=True, sharey=True, figsize=(10,3))
    ax1.contourf(spacing, spacing, func(*np.meshgrid(spacing, spacing)))
    ax1.set_title('Ground truth')
    ax2.scatter(X1[:,0], X1[:,1], c=z1, linewidths=0)
    ax2.set_title('Samples')
    ax3.contourf(spacing, spacing, z2.reshape(grid_shape))
    ax3.set_title('Reconstruction')
    plt.show()

    """
    def __init__(self, X=None, z=None, leafsize=10):
        if not X is None:
            self.tree = cKDTree(X, leafsize=leafsize )
        if not z is None:
            self.z = z

    def fit(self, X=None, z=None, leafsize=10):
        """
        Arguments:
        ----------
            X: (N, d) ndarray
                Coordinates of N sample points in a d-dimensional space.
            z: (N,) ndarray
                Corresponding scores.
            leafsize: int (default 10)
                Leafsize of KD-tree data structure;
                should be less than 20.

        Returns:
        --------
            invdisttree instance: object
        """
        return self.__init__(X, z, leafsize)

    def __call__(self, X, k=6, eps=1e-6, p=2, regularize_by=1e-9):
        self.distances, self.idx = self.tree.query(X, k, eps=eps, p=p)
        self.distances += regularize_by
        weights = self.z[self.idx.ravel()].reshape(self.idx.shape)
        mw = np.sum(weights/self.distances, axis=1) / np.sum(1./self.distances, axis=1)
        return mw

    def transform(self, X, k=6, p=2, eps=1e-6, regularize_by=1e-9):
        """
        Arguments:
        ----------
            X: (N, d) ndarray
                Coordinates of N query points in a d-dimensional space.

            k: int (default 6)
                Number of nearest neighbours to use.

            p: int or inf
                Which Minkowski p-norm to use.
                1 is the sum-of-absolute-values "Manhattan" distance
                2 is the usual Euclidean distance
                infinity is the maximum-coordinate-difference distance

            eps: float (default 1e-6)
                Return approximate nearest neighbors; the k-th returned value
                is guaranteed to be no further than (1+eps) times the
                distance to the real k-th nearest neighbor.

            regularise_by: float (default 1e-9)
                Regularise distances to prevent division by zero
                for sample points with the same location as query points.

        Returns:
        --------
            z: (N,) ndarray
                Corresponding scores.
        """
        return self.__call__(X, k, eps, p, regularize_by)

如何在python中的非线性网格上插入不规则分布的数据？

1 个答案: