我是apache pig的新手。我用tab分隔的字段创建了2个文件; employees.txt和employees2.txt [文件中没有行间距,这是为了使这个编辑器满意。]
employees.txt包含:
joe 21 94085 50000.0
Tom 21 94085 50000.0
John 21 94085 50000.0
employees2.txt包含:
joe 4085559898
joe 4085559899
tom 4085559897
tom 4085559896
john 4085559896
然后我尝试一个简单的加入:
e1 = LOAD 'employees.txt' AS (name, age, zip, salary);
e2 = LOAD 'employees2.txt' AS (name, phone);
e3 = JOIN e1 BY name, e2 BY name;
DUMP e3;
结果:
(joe,21,94085,50000.0,joe,4085559899)
(joe,21,94085,50000.0,joe,4085559898)
我期待:
(joe,21,94085,50000.0,joe,4085559899)
(joe,21,94085,50000.0,joe,4085559898)
(Tom,21,94085,50000.0,Tom,4085559897)
(Tom,21,94085,50000.0,Tom,4085559896)
(joe,21,94085,50000.0,Tom,4085559896)
我做错了什么?
谢谢,
克里斯
答案 0 :(得分:1)
与几乎所有计算机语言一样,Pig也区分大小写。因此" Joe" !="乔"和"汤姆" !="汤姆"。
您应该将employees.txt
文件中的名称更改为小写。然后你应该得到预期的结果。
您可以使用内置的Pig String函数LOWER来完成将name字段转换为全小写的任务。
有些事情:
e1 = LOAD 'employees.txt' AS (name, age, zip, salary);
e2 = LOAD 'employees2.txt' AS (name, phone);
e1_lower = FOREACH e1 GENERATE LOWER(name),age,zip,salary;
e3 = JOIN e1_lower BY name, e2 BY name;
DUMP e3;