BigQuery中的随机字符串函数

时间:2019-02-28 07:32:13

标签: google-bigquery

遇到这个answer时,我试图在BigQuery中生成随机字符串。

SELECT
    word
FROM
    `publicdata.samples.shakespeare`
WHERE
    RAND() < 10/(
    SELECT
        COUNT(*)
    FROM
        `publicdata.samples.shakespeare`)

它有效,但是我需要根据该答案创建一个函数。这是我尝试进行的转换,但是没有运气。

CREATE TEMP FUNCTION
    random_word() AS ( (
        SELECT
            STRING_AGG(word, "_") AS aggd_word
        FROM (
            SELECT
                LOWER(REPLACE(word, "'", "")) AS word
            FROM
                `publicdata.samples.shakespeare`
            WHERE
                RAND() < 10/(
                SELECT
                    COUNT(*)
                FROM
                    `publicdata.samples.shakespeare`)
            LIMIT
                3)) );
SELECT
    random_word();

我收到此错误

Table not found: `publicdata.samples.shakespeare`;
failed to parse CREATE [TEMP] FUNCTION statement at [25:9]

1 个答案:

答案 0 :(得分:2)

一种方法是将哈希转换为所需范围内的字符:

CREATE TEMP FUNCTION MapChar(c INT64) AS (
  CASE
    WHEN c BETWEEN 0 AND 9 THEN 48 + c -- 0 to 9
    WHEN c BETWEEN 10 AND 35 THEN 55 + c -- A to Z
    ELSE 61 + c -- a to z
  END
);

CREATE TEMP FUNCTION RandString() AS ((
  SELECT CODE_POINTS_TO_STRING(ARRAY_AGG(MapChar(MOD(c, 62))))
  FROM UNNEST(TO_CODE_POINTS(SHA256(CAST(RAND() AS STRING)))) AS c
));

SELECT RandString();

如果需要更长的字符串,可以使用SHA512代替SHA256