我正在具有8 GB RAM的Macbook上的docker容器中运行Influx-DB实例(所以我有点受限制了)。我正在编写一个对数据库执行查询的Python程序。由于无法获取大量数据且超时而失败,因此我正在执行顺序调用,在此我要获取属于一小时数据采集的数据。这是我顺序调用数据库的代码:
for day in range(15,25,1):
for hour in range(0,24, 1):
lowerDate = '2019-03-' + f'{day:02}'
lowerHour = f'{hour:02}' + ':00:00'
upperDate = lowerDate
upperHour = f'{hour:02}' + ':59:59'
rawDataSet: pd.DataFrame = influxDataReader.getRawDataByAcqTimeFrame(lowerDate=lowerDate,
lowerTime=lowerHour,
upperDate=upperDate,
upperTime=upperHour)
if rawDataSet is not None and rawDataSet.size > 0:
resultRawData = resultRawData.append(rawDataSet)
print('Got data for ' + lowerDate + 'T' + lowerHour + '. resultRawData.size = ' +
str(resultRawData.size))
else:
print('No data: ' + lowerDate + 'T' + lowerHour + '.')
time.sleep(1.0)
getRawDatabyTimeFrame方法:
def getRawDataByAcqTimeFrame(self, lowerDate: str, lowerTime: str, upperDate: str, upperTime: str):
queryStatement = """SELECT rawdata,
sequenceStartStamp,
timestampCycle
from YRT1DT1F_rawdata
WHERE time >= '""" + \
lowerDate + 'T' + lowerTime + '.0Z' + """' AND time <= '""" + \
upperDate + 'T' + upperTime + '.0Z' + """'"""
result: pd.DataFrame = pd.DataFrame(self._influxConnector.executeQueryStatement(queryStatement).get_points())
if dropDuplicateRows:
result = result.drop_duplicates()
return result
最后是“ executeQuery()”方法:
def executeQuery(self, selectStatement: str, chunked: bool = False, chunksize: int = 10000) -> influxdb.resultset.ResultSet:
if self._influxClient is None:
print('Initializing DB...')
self.initDbClient()
try:
queryResult: influxdb.resultset.ResultSet = self._influxClient.query(selectStatement,
chunked=chunked,
chunk_size=chunksize)
except Exception as err:
print('Error while executing DB statement: ' + selectStatement + '. Error message: ' + str(err))
finally:
self._influxClient.close()
try:
queryResult
except Exception as err:
print('queryResult was not assigned.')
raise err
else:
return queryResult
此代码按预期返回前三个调用(2019-03-15T00:00:00.0Z-2019-03-15T02:59:59.0Z)的数据。在2019-03-15T03:00:00和2019-03-22T23:59:59之间,数据库中没有可用的数据,因此代码返回空的rawDataSet
并发出消息“ No data:...”和预期的一样。
问题是,预计此代码的执行速度最快可以到2019-03-17T23:00:00(返回空rawDataSet
的调用每秒钟执行一次)。然后,从2019-03-18T:00:00:00开始,执行速度大大降低:每个“空”呼叫突然花费30秒甚至一分钟。
这可能是什么原因?
答案 0 :(得分:0)
在CLI模式下运行此查询需要多长时间?