Question

我想知道是否有可能根据短语过滤字符串？例如，我想计算在查询中出现ps3（ps 3）的次数。我不确定如何不使用与“ps 3”的过滤条件完全匹配，因为不知道如何在其中放置一个标签。到目前为止我的代码是：

data = LOAD '/user/cloudera/' using PigStorage(',') as (text:chararray);
filtered_data = FILTER data BY (text matches '.*ps3.*') OR (text == 'ps 3');
Res = FOREACH (GROUP filtered_data ALL) GENERATE COUNT(filtered_data);
DUMP Res;

所以显然代码无法计算像“今天的ps 3”这样的查询。有办法处理这个吗？

Answer 1

试试这个 -

A = LOAD 'input.csv' USING PigStorage(',')  AS  (text:chararray);
B = FILTER A BY (LOWER(text) MATCHES '.*ps 3.*' OR LOWER(text) MATCHES '.*ps3.*');

DUMP B输出：

(ps 3 today)
(ps 3)
(ps3)
(PS3TODAY)

PIG：根据短语过滤字符串

1 个答案: