Question

我有一个hive表emp_test如下：

'name'为字符串
“测试”为数组＆lt; struct＆lt; code：string，tests：array＆lt; struct＆lt; testtype：串，errorline：串GT;＆GT;＆GT;＆GT;

并且列值为：“name”为“JOHN”，“testing”为

[{ “代码”： “cod1234”， “测试”：[{ “testtype”： “Java” 的， “errorline”： “100”}，{ “testtype”： “C ++”， “errorline”：” 10000" }]}，
{ “代码”： “cod6790”， “测试”：[{ “testtype”： “蜂巢”， “errorline”： “10”}，{ “testtype”： “猪”， “errorline”： “978”}， { “testtype”： “火花”， “errorline”： “35”}]} ]

如何选择这些值并存储在另一个表emp_test_detail（name，code，testtype，errorline）中

JOHN cod1234 java 100
JOHN cod1234 C ++ 10000
JOHN cod6790 hive 10
JOHN cod6790 pig 978
JOHN cod6790 spark 35

我尝试过以下查询但收到错误：

插入emp_test_detail选择
        emp_tasting.code，
        emp_tasting.emp_tests.testtype，
        来自emp_test的emp_tasting.emp_tests.errorline 侧视图爆炸（测试）mytest作为emp_tasting
侧视图爆炸（测试[0]。测试）mytest为emp_tasting;

这里我不知道测试数组的确切长度。那么如何引用数组字段？

请帮帮我吗？

Answer 1

在您的示例查询中，错误可能与使用emp_tasting相关，两个lateral view explode行的列别名相同。他们需要有不同的别名。

要深度挖掘两个级别的数组，需要爆炸第一个数组，然后在爆炸嵌套数组时引用该爆炸数组的别名。

例如，您想要name, code, testtype, errorline

name可直接在表格中找到第一次爆炸时可以使用code 嵌套爆炸可以使用testtype和errorline。

注意我正在查看您的架构，而不是您列出的数据，我更容易推理

此查询应该执行您想要的操作

SELECT
  name,
  testingelement.code,
  test.testtype, 
  test.errorline 
FROM emp_test 
LATERAL VIEW explode(testing) testingarray as testingelement
LATERAL VIEW explode(testingelement.tests) testsarray as test;

表和列别名

请注意，explode后面添加了两个别名，第一个是它生成的表表达式，第二个是列的。

所以在这个例子中

LATERAL VIEW explode(testing) testingarray as testingelement

testingarray是表别名，testingelement是您需要引用的数组列别名，用于提取结构中的字段。

跳过第一次爆炸

如果您只需要直接来自表格和嵌套数组的字段，那么您可以通过执行单个LATERAL VIEW爆炸来快捷查询

LATERAL VIEW explode(testing.tests) testsarray as test

问题是它还会爆炸空数组，并且你不能使用* star扩展，你必须明确地引用字段名。这不是一件坏事。

什么是坏事是必须在查询中使用数组索引。一旦你开始写field[0]，那么有些东西闻起来很时髦。这只会导致数组的第一个元素，这将是非常有限的应用程序。

如何在Hive中爆炸具有未知数组长度的嵌套数组结构？

1 个答案: