如何在人口中分配假设数据?

时间:2013-11-22 18:11:56

标签: sql sql-server linq entity-framework linq-to-entities

我的数据库包含属性。其中一些已经过调查,有些则没有。根据调查,我们可以计算调查财产的费用。

然后,当一个房产没有被调查时,我们想要假设该房产的成本与被调查的类似房产相同。

所以我们去寻找匹配的属性来选择“克隆”。

如果属性在一个区块中,那么我们在同一个区块中查找被调查的属性,如果我们找不到任何,那么我们查看相同的邮政编码区域,然后我们查看相同的街道等。

如果块中有多个匹配属性,我们不希望使用相同的属性来克隆所有未经调查的属性,因此我们将调查的属性作为克隆旋转。

例如,假设我们在一个区块中有5个属性,并且已经调查了P1和P2。 P3应该使用P1作为克隆,P4应该使用P2作为克隆,P5应该使用P1作为克隆。 因此,该块的总成本将为3 * P1.GetCost()+ 2 * P2.GetCost()

我编写的代码在此基础上为单个属性标识了一个克隆。但我需要制作一份报告,总结可能超过数千个房产的成本。所以我认为我需要在数据库中创建一个视图来优化它。

我的问题是,我无法弄清楚如何计算每个被调查物业将被克隆到整个人口的次数。任何人都可以建议我可以应用的技术吗?

修改 根据anon的答案测试sql。这让我得到了每个未经调查的财产的匹配属性的数量,但是我想要为每个被调查的财产添加未经调查的属性,以获得成本乘数:

IF  EXISTS (SELECT
    *
FROM sys.objects
WHERE object_id = OBJECT_ID(N'dbo.PropertyTest') AND type IN (N'U'))
DROP TABLE dbo.propertytest
GO

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[D]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION [dbo].[D]
GO

CREATE TABLE dbo.PropertyTest
    (
    ID int NOT NULL,
    BlockID int NULL,
    PostCode nvarchar(50) NULL,
StreetName nvarchar(50) NULL,
    IsSurveyed bit NOT NULL,
    Cost decimal(18, 0) NULL
    )
GO
ALTER TABLE dbo.PropertyTest ADD CONSTRAINT
    PK_PropertyTest PRIMARY KEY CLUSTERED 
    (
    ID
    )
GO

CREATE function D(@surveyedid int, @unsurveyeyed int)
returns table as
return
(

select case when 
(SELECT u.blockid FROM propertytest u WHERE id = @unsurveyeyed) = (SELECT s.blockid FROM propertytest s where id = @surveyedid)
then 1
when 
(SELECT u.postcode FROM propertytest u WHERE id = @unsurveyeyed) = (SELECT s.postcode FROM propertytest s where id = @surveyedid)
then 2
else
null
end as Distance

)
GO

INSERT INTO propertytest (id
, blockid
, postcode
, issurveyed
, cost
, StreetName)
    SELECT
        1, 1,'G20 6DJ', 1,20, 'Doune Gardens' 
    UNION 
    SELECT 2, 1, 'G20 6DJ', 1,30 , 'Doune Gardens'
    UNION 
    SELECT 3, 1, 'G20 6DJ', 0, NULL , 'Doune Gardens'
    UNION 
    SELECT 4, 1, 'G20 6DJ', 0, NULL , 'Doune Gardens'
    UNION 
    SELECT 5, 1, 'G20 6DJ', 0, NULL , 'Doune Gardens'
    UNION 
    SELECT 6, null, 'G20 6DJ',  0, NULL, 'Doune Gardens'
    UNION 
    SELECT 7, null, 'G20 6BS',  0, NULL, 'Wilton Street'
    UNION 
    SELECT 8, 1, 'G20 6BT', 0, NULL, 'Wilton Street'

SELECT
    * INTO #s
FROM propertytest
WHERE issurveyed = 1

SELECT
    * INTO #u
FROM propertytest
WHERE issurveyed = 0

--This is close to anon's suggestion
--with the current function it returns the count of surveyed properties that match an unsurveyed property
SELECT
    #u.id,
    COUNT(*)
FROM #s
    CROSS JOIN #u
CROSS APPLY D(#S.ID,#U.ID) AS D
GROUP BY #u.id, D.Distance
HAVING D.Distance = MIN(D.Distance)

--I think this is closer to what I want
--with the current function it returns the total number 
--of unsurveyed properties that match a surveyed property
--so P1 and P2 both match 3 in the same block
--Now I need  P1 to act as proxy for for 2 of them and P2 to act as proxy for 1 of them
SELECT
    #s.id, D.Distance, COUNT(*)
FROM #s
    CROSS JOIN #u
CROSS APPLY D(#S.ID,#U.ID) AS D
GROUP BY #s.id, D.Distance
HAVING D.Distance = MIN(D.Distance)

DROP TABLE #s
DROP TABLE #u

这是我的Linq-to-entities代码的简化版本,可以进行匹配。 GetMatch方法是我使用模数旋转匹配属性的地方。所以在上面的例子中我们有2个匹配的属性和3个未分配的。如果未分配的属性在未分配的索引3处,则它的克隆位于matchingProperties中的索引1处。但我不能看到这种情况在整个人群中都有效,所以我正在寻找不同方法的灵感。

public class Property
{
   public int ID {get; set;}
   public int? BlockID {get; set;}
   public Block { get; set;}
   public PostCode { get; set; }
   public boolean IsSurveyed {get; set;}
   public decimal? GetCost()
   {
      //code to sum costs
   }
}

        private static Property GetMatch(Property property, 
           Func<Property, bool> matchFunction, 
           IQueryable<Property> surveyed, IQueryable<Property> unsurveyed)
        {

            var matchingProperties = surveyed.Where(matchFunction).OrderBy(p => p.ID);

            int count = matchingProperties.Count();
            Property match;
            if (count == 1)
            {
                match = matchingProperties.First();
            }
            else if (count > 1)
            {
                //there is more than one property to match

                //unallocated is the number of unsurveyed properties 
               //that match the criteria and they are ordered by id 
               //to ensure consistent allocation
                var unallocated = unsurveyed.Where(matchFunction)
                                            .OrderBy(p => p.ID)
                                            .ToList();

                //we want to match the first unallocated with the first matched, 
                //second with second but we must rotate through the matches, 
                //so use modulus
                int index = unallocated.IndexOf(property) % count;
                if (index < 0)
                    throw new InvalidOperationException
                 (@"The unsurveyed properties must include 
                    the property we want to clone");

                match = matchingProperties.ElementAt(index);
                //the property to index is a
            }
            else
                match = null;

            return match;
        }

private Property GetClone(Property property, out string cloneStatus)
{
   IQueryable<Property> surveyed;
   IQueryable<Property> unsurveyed;

   surveyed = _Uow.PropertyRepository.All.Where(p => p.IsSurveyed );
   unsurveyed = _Uow.PropertyRepository.All.Where(p => !p.IsSurveyed);

   if (property.Block != null)
   {
       Property match = GetMatch(property, 
       c => c.BlockID == property.Block.ID, 
       surveyed as IQueryable<Property>, unsurveyed as IQueryable<Property>);
       if (match != null)
           cloneStatus = "Cloned from same block: " 
           + match.GetFullAddress(" ", false);

      return match;
   }

   if (!String.IsNullOrEmpty(property.PostCode))
   {
       Property match = GetMatch(property, 
       c => c.PostCode == property.PostCode, surveyed, unsurveyed);
       if (match != null)
           cloneStatus = "Cloned from same postcode: " 
                       + match.GetFullAddress(" ", false);

       return match;
   }
}

2 个答案:

答案 0 :(得分:1)

两套: S (调查属性)和 U (未调查)

公式 D 计算 U 的每个成员到 S 的距离。这告诉你合适的S将如何充当U的代理。更短的距离更好。

对于每个U, S 的成员数量是否在最小距离?

SELECT U,COUNT(S)
来自 S
交叉加入 U
交叉申请 D (S,U)AS D
GROUP BY U
有D = MIN(D)

--Example distance function
CREATE FUNCTION dbo.D(@s int, @u int)
RETURNS TABLE AS
RETURN
SELECT CASE
  WHEN COUNT(DISTINCT block_id ) = 1 THEN 1
  WHEN COUNT(DISTINCT postcode ) = 1 THEN 2
  WHEN COUNT(DISTINCT street_id) = 1 THEN 3
END AS d
FROM propertytest
WHERE id IN (@s, @u)
GO

答案 1 :(得分:1)

我的方法是使用行号来匹配未调查的调查属性,例如我将第一个未调查的行与第一个调查的行匹配。我使用调查行数量的mod,以便例如,如果只有3个调查行,则第4个未调查行将匹配第1个调查行。

我的查询的优点是可以稍微修改它以返回被调查财产匹配的次数。

也为街道编辑:

以下是主要查询:

;with SurveyedByBlock
as
(
    select Id, BlockID, Cost, 
             ROW_NUMBER() OVER (PARTITION BY BlockId ORDER BY ID) AS RN, 
           (SELECT COUNT(*) 
              FROM PropertyTest P2 
              WHERE P1.BlockID = P2.BlockID AND P2.IsSurveyed = 1
             ) AS MaxNumberOfRows
    from PropertyTest P1
    where issurveyed = 1 AND BlockID IS NOT NULL
),
SurveyedByPostCode
as
(
    select Id, PostCode, Cost, 
             ROW_NUMBER() OVER (PARTITION BY PostCode ORDER BY ID) AS RN,
           (SELECT COUNT(*) 
              FROM PropertyTest P2 
              WHERE P1.PostCode = P2.PostCode AND P2.IsSurveyed = 1
             ) AS MaxNumberOfRows
    from PropertyTest P1
    where issurveyed = 1 AND PostCode IS NOT NULL
),
SurveyedByStreet
AS
(
     select Id, StreetName, Cost, 
            ROW_NUMBER() OVER (PARTITION BY StreetName ORDER BY ID) AS RN,
      (SELECT COUNT(*) 
             FROM PropertyTest P2 
             WHERE P1.StreetName = P2.StreetName AND P2.IsSurveyed = 1
            ) AS MaxNumberOfRows
from PropertyTest P1
where issurveyed = 1 AND StreetName IS NOT NULL
),
UnSurveyed
AS
(
    SELECT ID, BlockID, PostCode, Cost, 
             ROW_NUMBER() OVER (PARTITION BY BlockId ORDER BY ID) AS BlockRN,
       ROW_NUMBER() OVER (PARTITION BY PostCode ORDER BY ID) AS PostCodeRN,
       ROW_NUMBER() OVER (PARTITION BY StreetName ORDER BY ID) AS StreetNameRN
      FROM PropertyTest
    WHERE IsSurveyed = 0
)
SELECT UnSurveyed.Id, UnSurveyed.BlockID, UnSurveyed.PostCode, UnSurveyed.StreetName,
       COALESCE(SurveyedByBlock.Cost, SurveyedByPostCode.Cost, SurveyedByStreet.Cost) AS Cost, 
       COALESCE(SurveyedByBlock.ID, SurveyedByPostCode.ID, SurveyedByStreet.Id) AS SurveyedId
FROM UnSurveyed
LEFT JOIN SurveyedByBlock
    ON SurveyedByBlock.BlockID = UnSurveyed.BlockID 
AND 
      ((UnSurveyed.BlockRN % SurveyedByBlock.MaxNumberOfRows = SurveyedByBlock.RN )
       OR -- unsurveyed row number matches left over row number
    -- e.g. if we have 3 surveyed properties that match and this is the 4th row 
          -- in the unsurveyed properties it will match with the 1st surveyed row
          -- 4 mod 3 = 1
       (UnSurveyed.BlockRN % SurveyedByBlock.MaxNumberOfRows = 0 
           AND SurveyedByBlock.RN = SurveyedByBlock.MaxNumberOfRows)
)
LEFT JOIN SurveyedByPostCode
    ON SurveyedByPostCode.PostCode = UnSurveyed.PostCode
    AND ((UnSurveyed.PostCodeRN % SurveyedByPostCode.MaxNumberOfRows = SurveyedByPostCode.RN ) 
           OR
         (UnSurveyed.PostCodeRN % SurveyedByPostCode.MaxNumberOfRows = 0 
                 AND SurveyedByPostCode.RN = SurveyedByPostCode.MaxNumberOfRows)
        )
LEFT JOIN SurveyedByStreet
ON SurveyedByStreet.StreetName = UnSurveyed.StreetName
AND ((UnSurveyed.StreetNameRN % SurveyedByStreet.MaxNumberOfRows = SurveyedByStreet.RN ) 
           OR
          (UnSurveyed.StreetNameRN % SurveyedByStreet.MaxNumberOfRows = 0 
                  AND SurveyedByStreet.RN = SurveyedByStreet.MaxNumberOfRows)
    )

如果您想获得每个调查属性匹配的次数,请将最后一个select语句更改为:

...
SELECT COALESCE(SurveyedByBlock.ID, SurveyedByPostCode.ID) AS SurveyedId, COUNT(*)
...
GROUP BY COALESCE(SurveyedByBlock.ID, SurveyedByPostCode.ID)