我有一张超过1000万行的表格。大约有50多列。该表存储传感器数据/参数。假设我需要查询全天或86,400秒的数据。完成此查询需要大约20秒或更长时间。
我在几个列上添加了单独的索引,例如recordTimestamp(捕获数据时存储),deviceId(传感器的标识),positionValid(GPS地理位置是否有效)。然后我添加了一个包含所有三列的复合索引。
以下是我的询问:
t1 = time.time()
conn = engine.connect()
select_statement = select([Datatable]).where(and_(
Datatable.recordTimestamp >= start_date,
Datatable.recordTimestamp <= end_date,
Datatable.deviceId == device_id,
Datatable.positionValid != None,
Datatable.recordTimestamp % query_interval == 0))
lol_data = conn.execute(select_statement).fetchall()
conn.close()
t2 = time.time()
time_taken = t2 - t1
print('Select: ' + time_taken)
以下是我的EXPLAIN ANALYZE声明:
EXPLAIN ANALYZE SELECT datatable.id, datatable."createdAt", datatable."analogInput01", datatable."analogInput02", datatable."analogInput03", datatable."analogInput04", datatable."analogInput05", datatable."analogInput06", datatable."analogInput07", datatable."canEngineRpm", datatable."canEngineTemperature", datatable."canFuelConsumedLiters", datatable."canFuelLevel", datatable."canVehicleMileage", datatable."deviceId", datatable."deviceTemperature", datatable."deviceInternalVoltage", datatable."deviceExternalVoltage", datatable."deviceAntennaCut", datatable."deviceEnum", datatable."deviceVehicleMileage", datatable."deviceSimSignal", datatable."deviceSimStatus", datatable."iButton01", datatable."iButton02", datatable."recordSequence", datatable."recordTimestamp", datatable."accelerationAbsolute", datatable."accelerationBrake", datatable."accelerationBump", datatable."accelerationTurn", datatable."accelerationX", datatable."accelerationY", datatable."accelerationZ", datatable."positionAltitude", datatable."positionDirection", datatable."positionSatellites", datatable."positionSpeed", datatable."positionLatitude", datatable."positionLongitude", datatable."positionHdop", datatable."positionMovement", datatable."positionValid", datatable."positionEngine" FROM datatable WHERE datatable."recordTimestamp" >= 1519744521 AND datatable."recordTimestamp" <= 1519745181 AND datatable."deviceId" = '864495033990901' AND datatable."positionValid" IS NOT NULL AND datatable."recordTimestamp" % 1 = 0;
以下是SELECT:
的EXPLAIN ANALYZE的结果Index Scan using "ix_dataTable_recordTimestamp" on dataTable (cost=0.44..599.35 rows=5 width=301) (actual time=0.070..10.487 rows=661 loops=1)
Index Cond: (("recordTimestamp" >= 1519744521) AND ("recordTimestamp" <= 1519745181))
Filter: (("positionValid" IS NOT NULL) AND (("deviceId")::text = '864495033990901'::text) AND (("recordTimestamp" % 1) = 0))
Rows Removed by Filter: 6970
Planning time: 0.347 ms
Execution time: 10.658 ms
下面是Python计算的时间结果:
Select: 47.98712515830994
JSON: 0.19731807708740234
以下是我的代码分析:
10302 function calls (10235 primitive calls) in 12.612 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 12.595 12.595 /Users/afeezaziz/Projects/Bursa/backend/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py:882(execute)
1 0.000 0.000 12.595 12.595 /Users/afeezaziz/Projects/Bursa/backend/env/lib/python3.6/site-packages/sqlalchemy/sql/elements.py:267(_execute_on_connection)
1 0.000 0.000 12.595 12.595 /Users/afeezaziz/Projects/Bursa/backend/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py:1016(_execute_clauseelement)
1 0.000 0.000 12.592 12.592 /Users/afeezaziz/Projects/Bursa/backend/env/lib/python3.6/site-packages/sqlalchemy/engine/base.py:1111(_execute_context)
1 0.000 0.000 12.590 12.590 /Users/afeezaziz/Projects/Bursa/backend/env/lib/python3.6/site-packages/sqlalchemy/engine/default.py:506(do_execute)
1 12.590 12.590 12.590 12.590 {method 'execute' of 'psycopg2.extensions.cursor' objects}
1 0.000 0.000 0.017 0.017 /Users/afeezaziz/Projects/Bursa/backend/env/lib/python3.6/site-packages/sqlalchemy/engine/result.py:1113(fetchall)
1 0.000 0.000 0.017 0.017 /Users/afeezaziz/Projects/Bursa/backend/env/lib/python3.6/site-packages/sqlalchemy/engine/result.py:1080(_fetchall_impl)
1 0.008 0.008 0.017 0.017 {method 'fetchall' of 'psycopg2.extensions.cursor' objects}
答案 0 :(得分:0)
SQLAlchemy只是数据库的连接器,整个查询在数据库的末尾运行。
在Procedures和SQLAlchemy的帮助下优化查询,您可以将其归档。 这是一个很好的阅读,可以优化您的使用方式。 SQLAlchemy collection docs
如果您使用的是MySQL数据库,您还应该尝试使用比SQLAlchemy快一点的MySQLdb API,因为MySQLdb专门针对MySQL操作和迭代进行面向对象。
答案 1 :(得分:0)
尝试使用COPY中内置的Postgres,或者如果您真的需要在Python中检索结果(例如,您无法通过COPY直接写入磁盘),可以通过psycopgs copy_expert使用COPY功能:
cur = conn.cursor()
outputquery = "COPY ({0}) TO STDOUT WITH CSV HEADER".format(query)
with open('resultsfile', 'w') as f:
cur.copy_expert(outputquery, f)
conn.close()
这应该避免一起进行序列化。
答案 2 :(得分:-1)
"recordTimestamp"
,"deviceId"
和"positionValid"
,因此,请确保你已经从3列创建了索引。select([Datatable])
”,我猜您选择了所有列,因此,作为您的描述,有50多列,需要时间来解析数据并将数据发送到客户端。更清楚的是,添加索引只会帮助您“执行时间”(查找结果的时间),但不会帮助您“获取时间”(当您运行“lol_data = conn.execute(select_statement).fetchall()”时) "deviceId"
,"recordTimestamp"
,值。您可以使用索引更改"deviceId"
(比较和发送字符串比使用整数花费更多时间)。