我们在Bigquery中收到调查Web挂钩数据。本地语言的注释被捕获为unicode,我们在该注释中确实具有特殊性。
示例
我们找到了解码个人评论的解决方案: -
CREATE TEMPORARY FUNCTION utf8convert(s STRING)
RETURNS STRING
LANGUAGE js AS """
return unescape( ( s ) );
""";
with sample AS (SELECT '\u522b\u8001\u662f\u665a' AS S)
SELECT utf8convert(s) from sample
在评论字段中实现此代码时,有数千条评论和不同的语言,它无效。
CREATE TEMPORARY FUNCTION utf8convert(s STRING)
RETURNS STRING
LANGUAGE js AS """
return unescape( ( s ) );
""";
SELECT Comment, utf8convert(Comment) as Convert
FROM `airasia-nps.nps_production.NPSDashboard_Webhook_Data1`
where Comment is not null
运行时没有错误,但结果是在Unicode中不会更改为本地语言。 Result: local language in Unicode
我试过这段代码
CREATE TEMP FUNCTION DecodeUnicode(s STRING) AS (
IF(s NOT LIKE '%\\u%', s,
(SELECT CODE_POINTS_TO_STRING(ARRAY_AGG(CAST(CONCAT('0x', x) AS INT64)))
FROM UNNEST(SPLIT(s, '\\u')) AS x
WHERE x != ''))
);
SELECT
original,
DecodeUnicode(original) AS decoded
FROM (
SELECT trim(r'$-\u6599\u91d1\u304c\u9ad8\u3059\u304e\uff01\uff01\uff01') AS original UNION ALL
SELECT trim(r'abcd')
);
显示error我认为它是因为评论以特殊字符开头?
答案 0 :(得分:1)
看看这是否有效。它通过转换为Unicode代码点然后转换为字符串,对其中包含\ u的字符串执行“手动”解码。它应该比使用JavaScript更快。
CREATE TEMP FUNCTION DecodeUnicode(s STRING) AS (
IF(s NOT LIKE '%\\u%', s,
(SELECT CODE_POINTS_TO_STRING(ARRAY_AGG(CAST(CONCAT('0x', x) AS INT64)))
FROM UNNEST(SPLIT(s, '\\u')) AS x
WHERE x != ''))
);
SELECT
original,
DecodeUnicode(original) AS decoded
FROM (
SELECT r'\u522b\u8001\u662f\u665a\u70b9\uff0c\u73b0\u573a\u8865\u884c\u674e\u8d39\u592a\u8d35' AS original UNION ALL
SELECT r'abcd'
);
作为输出,返回别老是晚点,现场补行李费太贵
和abcd
。