Datalake分析连接

时间:2017-09-22 10:07:25

标签: analytics azure-data-lake

我有2张桌子。我想要表格中的分类网址[Activite_Site]我在下面尝试查询,但它不起作用......任何人都有想法。     提前谢谢

org.xml.sax.SAXParseException

1 个答案:

答案 0 :(得分:1)

RE错误E_CSC_USER_JOINCOLUMNSEXPECTEDONEACHSIDEOFCONDITION,U-SQL当前不支持连接条件中的派生列。

实现这一目标的一种方法可能是找到匹配的URL,然后找到不匹配的URL和UNION。

@category = SELECT *
     FROM (
        VALUES
            ( "http//www.site.com/business", "B2B" ),
            ( "http//www.site.com/office", "B2B" ),
            ( "http//www.site.com/job", "B2B" ),
            ( "http//www.site.com/home", "B2C" )
        ) AS x(url, cat);


@siteActivity = SELECT *
     FROM (
        VALUES
            ( "http//www.site.com/business/page2/test.html" ),
            ( "http//www.site.com/business/page3/pagetest/tot.html" ),
            ( "http//www.site.com/office/all/tot.html" ),
            ( "http//www.site.com/home/holiday/paris.html" ),
            ( "http//www.site.com/home/private/moncompte.html" ),
            ( "http//www.site.com/test/pte.html" )
        ) AS x(url);


// Find matched URLs
@working =
    SELECT sa.url,
           c.cat
    FROM @siteActivity AS sa
         CROSS JOIN
             @category AS c
         WHERE sa.url.Substring(0, c.url.Length) == c.url;


// Combine the matched and unmatched URLs
@output =
    SELECT url,
           cat
    FROM @working

    UNION ALL

    SELECT url,
           (string) null AS cat
    FROM @siteActivity AS sa
         ANTISEMIJOIN
             @working AS w
         ON sa.url == w.url;



OUTPUT @output TO "/output/output.csv"
USING Outputters.Csv(quoting:false);

我想知道是否有更有效的方式。