Question

我有一张表，我会初步加载客户数据，然后每天附加数据。每日追加量主要是现有客户，但会有一些新客户。我需要在此表中创建一个company_unique_id字段/键。

鉴于Databricks＆＃39; 唯一数字 ID的DATASKIPPING优势我想在IDENTITY(1,1)中使用与MSSQL类似的内容。 Spark SQL的uuid将是我想要的，但它是字符串。 monotonically_increasing_id()也接近我的需要，但它并非唯一。

解决方案（如果存在）在Spark SQL中是非常可取的，而不是其他API（例如，Scala，Python等）。也许它不存在，我将不得不等待Databricks Delta发布。

示例（如果uuid工作）

create table customer.customer_id 
    using parquet 
    select 
        client_id, 
        uuid() as my_company_id, 
        action as customer_action 
    from client_table

第二天：

insert into customer.customer_id 
    select 
        client_id, 
        uuid() as my_company_id, 
        action as customer_action 
    from client_table

TIA

具有唯一数字ID的Spark SQL表

0 个答案: