R3.8xlarge RDS数据库需要数天才能运行查询

时间:2014-08-23 16:24:22

标签: mysql sql amazon-web-services rds

我目前有一个表,其中包含大约1,200万行评论和列:

id
productid
title
price
userid
profilename
helpfulness
score
review_time
summary
text

我的查询如下:

SELECT title, productid as p, count(text) as positive,
       (SELECT count(*) FROM `reviews` WHERE productid = p) as total
FROM `reviews`
WHERE text like '%my favorite book%'
GROUP BY productid
ORDER BY positive DESC;

它基本上是在评论文本中查找所有具有“我最喜欢的书”的产品,计算每个产品匹配的评论数量,然后计算每个产品的评论总数。 。

我在AWS'RDS上的数据库中有这个表,其中类设置为我能看到的最快的r3.8xlarge但是它仍然需要几天才能运行。

现在,至少对我来说,更奇怪的是,如果我将搜索文本更改为以下内容:

SELECT title, productid as p, count(text) as positive,
       (SELECT count(*) FROM `reviews` WHERE productid = p) as total
FROM `reviews`
WHERE text like '%tim ferriss%' or
      text like '%timothy ferriss%'  or
      text like '%four hour workweek%'  or
      text like '%4-hour workweek%'  or
      text like '%four hour body%'  or
      text like '%4-hour body%'  or
      text like '%4 hour workweek%'  or
      text like '%4 hour body%'  or
      text like '%four hour chef%'  or
      text like '%4-hour chef%'  or
      text like '%4 hour chef%'
GROUP BY productid
ORDER BY positive DESC

甚至将数据库类降低到m3.2xlarge,查询只需要不到20分钟。

我在这里遗漏了什么吗?任何建议都会有所帮助,谢谢。

1 个答案:

答案 0 :(得分:2)

我认为您的查询更容易使用条件聚合编写:

SELECT title, productid as p,
       sum(text like '%my favorite book%') as positive,
       count(*) as total
FROM `reviews`
GROUP BY productid
ORDER BY positive DESC;

您的原始查询过滤掉了没有正面评价的产品。如果你真的想要这个,那么你可以添加:

HAVING positive > 0

group by之后。