猪拉丁语中的“外袋”和“内袋”有什么区别?

时间:2013-10-08 01:27:22

标签: apache-pig

手册/文档广泛使用“内袋”和“外袋”的语言(例如:http://pig.apache.org/docs/r0.11.1/basic.html),但我无法清楚地明确区分术语的确切定义。

e.g。本质上是相互关联的:

  • 如果我给你一个包'foo',你需要知道什么才能将foo标记为'内袋'而不是'外袋'?
  • “任何包包”谁不是最外层的包,那么'内袋'?
  • 内部和外部的标签是否总是独占?
  • 在PigLatin中,所有'袋'关系' - 或者只是'最外层袋'的关系? (和内袋不是关系)

创建一个可讨论的例子:

grunt> dump A;      
(1,2,3)
(4,2,1)
(8,3,4)
(4,3,3)


grunt> W1 = GROUP A   ALL;         
grunt> W2 = GROUP W1  ALL;
grunt> W3 = GROUP W2  ALL;
grunt> W4 = GROUP W3  ALL;

grunt> describe W4;
W4: {group: chararray,W3: {(group: chararray,W2: {(group: chararray,W1: {(group: chararray,A: {(f1: int,f2: int,f3: int)})})})}}


grunt> illustrate W4;
(1,2,3)
---------------------------------------------------
| A     | f1:int      | f2:int      | f3:int      | 
---------------------------------------------------
|       | 1           | 2           | 3           | 
|       | 8           | 3           | 4           | 
---------------------------------------------------
------------------------------------------------------------------------------------------------
| W1     | group:chararray      | A:bag{:tuple(f1:int,f2:int,f3:int)}                          | 
------------------------------------------------------------------------------------------------
|        | all                  | {(1, 2, 3), (8, 3, 4)}                                       | 
------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------
| W2     | group:chararray      | W1:bag{:tuple(group:chararray,A:bag{:tuple(f1:int,f2:int,f3:int)})}                                         | 
-----------------------------------------------------------------------------------------------------------------------------------------------
|        | all                  | {(all, {(1, 2, 3), (8, 3, 4)})}                                                                             | 
-----------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| W3     | group:chararray      | W2:bag{:tuple(group:chararray,W1:bag{:tuple(group:chararray,A:bag{:tuple(f1:int,f2:int,f3:int)})})}                                                        | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|        | all                  | {(all, {(all, {(1, 2, 3), (8, 3, 4)})})}                                                                                                                   | 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| W4     | group:chararray      | W3:bag{:tuple(group:chararray,W2:bag{:tuple(group:chararray,W1:bag{:tuple(group:chararray,A:bag{:tuple(f1:int,f2:int,f3:int)})})})}                                                                       | 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|        | all                  | {(all, {(all, {(all, {(1, 2, 3), (8, 3, 4)})})})}                                                                                                                                                         | 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

grunt> dump W4;
(all,{(all,{(all,{(all,{(1,2,3),(4,2,1),(8,3,4),(4,3,3)})})})})
袋子中的W1,W2,W3,W4--哪个是内部,哪个是外部?

1 个答案:

答案 0 :(得分:4)

外袋实际上是关系A。这有点奇怪,但一旦你知道内袋是什么就会变得清晰。为了便于阅读,我们只看W1,因为嵌套包不会改变答案。

W1的架构和输出:

W1: {group:chararray, A:bag{:tuple(f1:int,f2:int,f3:int)}}
(all,{(1, 2, 3), (8, 3, 4)})

我们可以看到他们是名为W1 A的{​​{1}}字段。这是一个内袋,因为袋子是关系中的一个领域。

请记住,行李只是无序的元组集,我们可以看到这是W1的输出。现在,查看关系A的输出:

(1,2,3)
(4,2,1)
(8,3,4)
(4,3,3)

Pig不保证这些元组的顺序(除非你ORDER或其他东西)。所以,如果你考虑一下,关系A实际上只是一个无序的元组。这是一个外袋。

您可以找到此here的一些示例。