我正在研究一个分析项目,该项目需要我从Teradata中的一个很大的表中提取一些数据。这是我正在使用的查询:
select TransactionNumber
from my_table
where TransactionDate between date '2017-01-01' and date '2017-12-31'
and ItemNumber in (99276);
即使我在2017年全年都在过滤my_table,该查询仍然会产生近9亿行,并且该查询要花30秒钟多一点的时间才能运行。由于项目的性质,我希望它能在5秒钟或更短的时间内运行,但是鉴于表的大小,我什至不确定是否可行。如果有帮助,这是我使用“解释”时显示的内容:
1) First, we lock DBTables.my_table in view
DB.my_table for access.
2) Next, we do an all-AMPs RETRIEVE step from 365 partitions of
DBTables.my_table in view DB.my_table with a
condition of ("(DBTables.my_table in view
DB.my_table.TransactionDate <= DATE '2017-12-31') AND
((DBTables.my_table in view
DB.my_table.TransactionDate >= DATE '2017-01-01') AND
(DBTables.my_table in view
DB.my_table.ItemNumber = 99276 ))") into Spool
1 (group_amps), which is built locally on the AMPs. The size of
Spool 1 is estimated with no confidence to be 617,535,066 rows (
14,203,306,518 bytes). The estimated time for this step is 2
minutes and 48 seconds.
3) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 2 minutes and 48 seconds.
诚然,我对优化查询不是很熟悉,而且由于我不是DBA,所以我只能读取数据库。