对地理坐标的数据集进行分箱/分组

时间:2014-03-02 16:55:52

标签: python pandas geometry leaflet geo

我有一个包含两列的大型数据集:时间戳 lat / lon 。我想以某种方式对坐标进行分组,以确定记录的不同位置的数量,将所有位置彼此相距一定距离处理。基本上我想弄清楚这个数据集中有多少个不同的“位置”。 A good visual example is this我想在这里结束,但我不知道群集在我的数据集中的位置。

2 个答案:

答案 0 :(得分:1)

详细介绍behzad.nouri的参考资料

# X= your Geo Array

# Standardize features by removing the mean and scaling to unit variance
X = StandardScaler().fit_transform(X)

# Compute DBSCAN
db = DBSCAN(eps=0.3, min_samples=3).fit(X)

# HERE
# eps -- The maximum distance between two samples 
#  for them to be considered as in the same neighborhood.
# min_samples -- The number of samples in a neighborhood for a point 
#  to be considered as a core point.

core_samples = db.core_sample_indices_
labels = db.labels_

# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)

答案 1 :(得分:0)

此伪代码演示了如何在计算网格分区中的点数的同时将一组点减少到每个网格分区的单个点。如果您有一组点,其中某些区域稀疏而其他区域密集,但希望均匀分布显示的点(例如在地图上),则此功能非常有用。

要使用该功能,可以通过一个轴(例如,X)传递点集和分区数。将在另一轴(例如,Y)上使用相同的分区。因此,如果指定3,则将生成9(3 * 3)个相等大小的分区。该函数首先遍历该组点,以找到绑定整个集合的最外侧X和Y(最小和最大)坐标。然后将最外面的X和Y轴之间的距离除以分区数以确定网格尺寸。

然后,该函数逐步执行每个网格分区,并检查集合中的每个点是否在网格分区内。如果该点位于网格分区内,则会检查这是否是网格分区中遇到的第一个点。如果是,则设置标志以指示已找到第一个点。否则,不是网格分区中的第一个点,该点将从点集中移除。

对于在分区中找到的每个点,该函数会递增计数。最后,当每个网格分区完成缩减/计数时,可以看到计算的点(例如,使用计数指示器在单点上的地图上显示标记):

function TallyPoints( array points, int npartitions )
{
    array partition = new Array();

    int max_x = 0, max_y = 0;
    int min_x = MAX_INT, min_y = MAX_INT

    // Find the bounding box of the points
    foreach point in points
    {
        if ( point.X > max_x )
            max_x = point.X;
        if ( point.Y < min_x )
            min_x = point.X;
        if ( point.Y > max_y )
            max_y = point.Y;
        if ( point.Y < min_y )
            min_y = point.Y;
    }

    // Get the X and Y axis lengths of the paritions
    float partition_length_x =  ( ( float ) ( max_x - min_x ) ) / npartitions;
    float partition_length_y =  ( ( float ) ( max_y - min_y ) ) / npartitions;

    // Reduce the points to one point in each grid partition
    // grid partition
    for ( int n = 0; n < npartitions; n++ )
    {
        // Get the boundary of this grid paritition
        int min_X = min_x + ( n * partition_length_x );
        int min_Y = min_y + ( n * partition_length_y );
        int max_X = min_x + ( ( n + 1 ) * partition_length_x );
        int max_Y = min_y + ( ( n + 1 ) * partition_length_y );

        // reduce and tally points
        int     tally  = 0;
        boolean reduce = false; // set to true after finding the first point in the paritition
        foreach point in points
        {
            // the point is in the grid parition
            if ( point.X >= min_x && point.X < max_x &&
                 point.Y >= min_y && point.X < max_y )
            {
                // first point found
                if ( false == reduce )
                {
                    reduce = true;
                    partition[ n ].point = point;   // keep this as the single point for the grid
                }
                else
                    points.Remove( point ); // remove the point from the list

                // increment the tally count
                tally++;
            }
        }

        // store the tally for the grid
        partition[ n ].tally = tally;

        // visualize the tallied point here (e.g., marker on Google Map)
    }
}