Jaccard得分/距离或百分比重叠

时间:2014-01-14 16:44:55

标签: c# sql performance distance overlap

我希望能够计算一个矩形与矩形网格的Jaccard分数/距离(距离为1分)。我的网格是50x50(总长1625625)。

我可以在.34秒内计算输入矩形对所有这些的得分,但是它不够快,因为我需要能够处理10k矩形,或者将结果存储在DB中(更新10s)每次调用数千行)。所以我希望让DB为我做计算而不必从数据库中取出任何东西,但是如果没有游标,我无法想到如何做到这一点......

sourceRectangles包含我的单个矩形(虽然实际上会有10k),rectangles包含我的网格,temporaryRectangleList包含得分的总和。

Dictionary<UInt32, Rectangle> temporaryRectangleList = new Dictionary<UInt32, Rectangle>();
foreach (var sourceRectangle in sourceRectangles)
{
    foreach (var rectangle in rectangles)
    {
        // For each rectangle within the group
        //foreach (var rectangle in group)
        //{
        int max_MinX = Math.Max(sourceRectangle.MinX, rectangle.MinX);
        int min_MaxX = Math.Min(sourceRectangle.MaxX, rectangle.MaxX);

        // There is an overlap
        //if (max_MinX < min_MaxX)
        //{
        int max_MinY = Math.Max(sourceRectangle.MinY, rectangle.MinY);
        int min_MaxY = Math.Min(sourceRectangle.MaxY, rectangle.MaxY);


        // Calculate the area of the overlap
        int area = ((min_MaxX - max_MinX)*(min_MaxY - max_MinY));
        // Store the Jaccard score
        var score = (double) area/((sourceRectangle.Area + rectangle.Area) - area);

        if (temporaryRectangleList.ContainsKey(rectangle.ID))
        {
            temporaryRectangleList[rectangle.ID].Weight += score;
        }
        else
        {
            temporaryRectangleList.Add(rectangle.ID, new Rectangle(rectangle, score));
        }
    }
}

我需要能够在字典中查找项目,因为我需要通过矩形的ID从中提取数据。

如果你认为你可以加快C#的速度(10k矩形进程&lt; 1s),那就去吧,但.34s是我每个矩形能做的最好的,所以我正在寻找一个等价的SQL这段代码(理想情况下虽然更好......大声笑)。

不幸的是,SQL表太大而无法在这里转储,所以我只能给你结构:

USE [Rectangles]
GO

/****** Object:  Table [dbo].[PreProcessed]    Script Date: 14/01/2014 16:39:33 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[PreProcessed](
    [ID] [int] NOT NULL,
    [MinX] [int] NOT NULL,
    [MinY] [int] NOT NULL,
    [MaxX] [int] NOT NULL,
    [MaxY] [int] NOT NULL,
    [Area] [int] NOT NULL,
 CONSTRAINT [PK_PreProcessed] PRIMARY KEY CLUSTERED 
(
    [ID] ASC,
    [Area] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

矩形类:

public class Rectangle
{
    public Rectangle(UInt32 id, int minX, int maxX, int minY, int maxY, double weight)
    {
        ID = id;
        MinX = minX;
        MaxX = maxX;
        MinY = minY;
        MaxY = maxY;
        Area = (maxX - minX)*(maxY - minY);
        Weight = weight;
    }

    public Rectangle(Rectangle input, double weight)
    {
        ID = input.ID;
        MinX = input.MinX;
        MaxX = input.MaxX;
        MinY = input.MinY;
        MaxY = input.MaxY;
        Area = input.Area;
        Weight = weight;
    }

    public int Area { get; set; }
    public int MinX { get; set; }
    public int MaxX { get; set; }
    public int MinY { get; set; }
    public int MaxY { get; set; }

    public UInt32 ID { get; set; }
    public double Weight { get; set; }
}

1 个答案:

答案 0 :(得分:0)

SQL Server具有geometry数据类型。这有计算多边形的交集和并集的方法。