Question

我无法弄清楚我是如何处理这个问题的：

这是我的数据：

Table1:         Table2:
BRAND           PRODUCT           SOLD
Sony            Sony ABCD         1233
Apple           Sony adv          1233
Google          Sony aaaa         1233
IBM             Apple 123         1233
etc.            Apple 345         1233
                IBM 13123         1233

是否可以过滤查询，我有一个表格，其中包含品牌和销售总量？我的想法是：

Select table1.brand, sum(table2.sold) from table1
join table2
on (table1.brand LIKE '%table2.product%')
group by table.1.brand

这是我的想法，但我总是得到错误

最大的问题是Like-Operator还是有其他解决方案吗？

Answer 1

我看到两个问题：首先，蜂巢中的JOIN只能在平等条件下工作，就像不能在那里工作一样。

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins

Hive仅支持等同连接，外连接和左半连接。 Hive不支持非平等条件的连接条件，因为很难表达map / reduce作业等条件。

相反，它想要进入where子句。

其次，我也看到了类似声明本身的问题：＆＃39;％table2.product％＆＃39;被字面上解释为字符串＆＃39;％table2.product％＆＃39;。此外，即使这样做的目的是什么，它也会尝试在品牌内部寻找table2.product，当你似乎想要另一种方式时。要获得您想要的评估，您需要将通配符添加到table1.brand的内容中;要实现这一点，您需要将通配符连接到表达式中。

table2.product LIKE concat('%',table1.brand,'%'))

通过这样做，您的喜欢将评估字符串＆＃39;％Sony％＆＃39;％Apple％＆＃39; ...等而不是＆＃39;％table2.product ％＆＃39;

你想要的是Brandon Bell的查询，我已经将其合并到这个答案中：

SELECT table1.brand, SUM(table2.sold) 
FROM table1, table2
WHERE table2.product LIKE concat('%', table1.brand, '%') 
GROUP BY table1.brand;

Answer 2

你应该能够在没有JOIN的情况下完成这个任务。请参阅以下查询：

SELECT table1.brand, sum(table2.sold) 
FROM table1, table2 
WHERE table2.product LIKE concat('%', table1.brand, '%') 
GROUP BY table1.brand;

返回

Apple   2466
IBM     1233
Sony    3699

我的输入文件如下：

Sony
Apple
Google
IBM

和

Sony ABCD       1233
Sony adv        1233
Sony aaaa       1233
Apple 123       1233
Apple 345       1233
IBM 13123       1233

Hive - LIKE运算符

2 个答案: