我有两个文档,我需要用第一个文档词来过滤第二个文档单词
我曾尝试但没有工作
Action action = async () =>
{
try
{
Console.WriteLine("Action start...");
await Task.Delay(1000);
throw new Exception("Exception from an async action");
}
catch(Exception ex)
{
// do something
}
};
答案 0 :(得分:0)
而不是过滤我使用的连接
一个。内部联接:
A = load '/user/balanagaraju.maliset/Dump/abc.txt' AS (line:chararray);
B = load '/user/balanagaraju.maliset/Dump/abc.txt' AS (line:chararray);
words1 = FOREACH A GENERATE FLATTEN(TOKENIZE(line)) as word;
words2 = FOREACH B GENERATE FLATTEN(TOKENIZE(line)) as wordz;
x = JOIN words1 by word , words2 by wordz;
grouped = group x BY word;
D = foreach grouped generate COUNT(x), group;
Dump D;
b.Cross加入:
A = load '/user/balanagaraju.maliset/Dump/abc.txt' AS (line:chararray);
B = load '/user/balanagaraju.maliset/Dump/abc.txt' AS (line:chararray);
words1 = FOREACH A GENERATE FLATTEN(TOKENIZE(line)) as word;
words2 = FOREACH B GENERATE FLATTEN(TOKENIZE(line)) as word;
C= CROSS words1,words2;
CC = foreach C generate $0 as first ,$1 as second;
R = FILTER CC by first==second;
grouped = group R BY first;
D = foreach grouped generate group, COUNT(R);
Dump D;
答案 1 :(得分:0)
您的要求似乎是: -
您有2个文件A和B.您想要排除文件A中存在的所有单词。您可以使用左外连接。
脚本将如下所示: -
file1 = load' A'使用PigStorage()作为(word1:chararray);
file2 =加载' B'使用PigStorage()作为(word2:chararray);
join = join file2 by word2 left outer,file1 by word1;
filtered =由word1连接的过滤器为null;
dump filtered;
说明: - left outer将确保包含file2中的所有单词。因此file1和file2中的所有匹配单词都将具有非null值。如果过滤掉NULL值word1,则它们是file2中存在的剩余单词,但不存在于file1
中