U-SQL构建错误,equijoin有不同的类型

时间:2016-12-01 13:51:47

标签: sql analytics azure-data-lake u-sql

我尝试创建一个USQL作业,并从将要检索的CSV中定义我的列,但是我在JOIN部分总是遇到问题,因为我匹配的列是不同的类型。这很奇怪,因为我已将它们定义为相同的类型。查看问题所在的屏幕截图:

enter image description here

以下是完整的USQL:

@guestCheck = 
    EXTRACT GuestCheckID int,
            POSCheckGUID Guid,
            POSCheckNumber int?,
            OwnerEmployeeID int,
            CreatedDateTime DateTime?,
            ClosedDateTime DateTime?,
            TicketReference string,
            CheckAmount decimal?,
            POSTerminalID int,
            CheckState string,
            LocationID int?,
            TableID int?,
            Covers int?,
            PostedDateTime DateTime?,
            OrderChannelID int?,
            MealPeriodID int?,
            RVCLocationID int?,
            ReopenedTerminalID int?,
            ReopenedEmployeeID int?,
            ReopenedDateTime DateTime?,
            ClosedBusDate int?,
            PostedBusDate int?,
            BusHour byte?,
            TaxExempt bool?,
            TaxExemptReference string
    FROM "/GuestCheck/GuestCheck-incomplete.csv"
    USING Extractors.Csv();

@guestCheckAncillaryAmount =
    EXTRACT CheckAncillaryAmountID int,
            GuestCheckID int,
            GuestCheckItemID int?,
            AncillaryAmountTypeID int,
            Amount decimal,
            FirstDetail int?,
            LastDetail int?,
            IsReturn bool?,
            ReturnReasonID int?,
            AncillaryReasonID int?,
            AncillaryNote string,
            ClosedBusDate int?,
            PostedBusDate int?,
            BusHour byte?,
            LocationID int?,
            RVCLocationID int?,
            IsDelisted bool?,
            Exempted bool?
    FROM "/GuestCheck/GuestCheckAncillaryAmount.csv"
    USING Extractors.Csv();

@ancillaryAmountType = 
    EXTRACT AncillaryAmountTypeID int,
            AncillaryAmountCategoryID int,
            CustomerID int,
            CheckTitle string,
            ReportTitle string,
            Percentage decimal,
            FixedAmount decimal,
            IncludeOnCheck bool,
            AutoCalculate bool,
            StoreAtCheckLevel bool?,
            DateTimeModified DateTime?,
            CheckTitleToken Guid?,
            ReportTitleToken Guid?,
            DeletedFlag bool,
            MaxUsageQty int?,
            ApplyToBasePriceOnly bool?,
            Exclusive bool,
            IsItem bool,
            MinValue decimal,
            MaxValue decimal,
            ItemGroupID int?,
            LocationID int,
            ApplicationOrder int?,
            RequiresReason bool,
            Exemptable bool?
    FROM "/GuestCheck/AncillaryAmountType.csv"
    USING Extractors.Csv();

@read =
    SELECT t.POSCheckGUID,
           t.POSCheckNumber,
           t.CheckAmount,
           aat.AncillaryAmountTypeID,
           aat.CheckTitle,
           gcd.Amount
    FROM @guestCheck AS t         
         LEFT JOIN
             @guestCheckAncillaryAmount AS gcd
         ON t.GuestCheckID == gcd.GuestCheckID
         LEFT JOIN
             @ancillaryAmountType AS aat
         ON gcd.AncillaryAmountTypeID == aat.AncillaryAmountTypeID
    WHERE aat.AncillaryAmountCategoryID IN(2, 4, 8);

OUTPUT @read
TO "/GuestCheckOutput/output.csv"
USING Outputters.Csv();

2 个答案:

答案 0 :(得分:3)

实际上,U-SQL是强类型的,intint?是不同的类型。您需要强制转换为中间行集:

@ancillaryAmountType2 =
SELECT (int?) aat.AncillaryAmountTypeID AS AncillaryAmountTypeID,
       aat.AncillaryAmountCategoryID,
       aat.CheckTitle
FROM @ancillaryAmountType AS aat;

或者,更好的是,使用维度建模最佳实践,并出于http://blog.chrisadamson.com/2013/01/avoid-null-in-dimensions.html中所述的原因避免可为空的“维度”。

答案 1 :(得分:3)

这与EXTRACT表定义中指定的列的可为空性无关,因为OP已在其代码中显示,两个连接列都未指定为null(即{{{{ 1}})在?定义中。这与多个外连接以及所谓的空值提供表有关。

如果你从逻辑上考虑它,想象你有三个表,TableA有3个记录,TableB有两个记录,TableC有一个记录,如下所示:

Tables

如果从tableA和EXTRACT开始到tableB,你本能地知道你会得到三条记录但是对于tableB列x,列x将为null;这是您提供空值的表,以及可空性的来源。

谢天谢地,修复是一样的;稍早更改列的可为空性或指定替换值,例如-1。

left outer join

但是,您的特定查询还有另一个问题。在大多数关系数据库中,向@t3 = SELECT (int?) x AS x, 2 AS a FROM dbo.tmpC; // OR // Use conditional operator to supply substitute values @t3 = SELECT x == null ? -1 : x AS x, 2 AS a FROM dbo.tmpC; 右侧的表添加WHERE子句会将连接转换为left outer join,并且在U-SQL中也是如此。您可能想要考虑您尝试获得的实际结果并考虑重写您的查询。

HTH