Question

在django应用程序中，我尝试将表示来自 n 个传感器的各个时间序列值 x 的查询集解析为元组（t，x1，x2）。 .. x n ），然后以google图表在此处指定的格式放入json对象：https://developers.google.com/chart/interactive/docs/gallery/linechart

如果没有在特定传感器的给定时间戳记中记录任何值，则将

None 值用作占位符

对于具有约6500行（约3秒，在本地运行）的QuerySet，页面加载时间很重要

服务器上的时间明显更长

http://54.162.202.222/pulogger/simpleview/?device=test

分析表明_winapi.WaitForSingleObject（我无法解释）上花费了99.9％的时间，使用计时器进行手动分析表明服务器端的罪魁祸首是迭代QuerySet和组的while循环。将值转换为元组（在我的代码示例中的第23行）

结果如下：

基本获取（花费5毫秒）

查询的数据（占用0毫秒）

通过传感器分割数据（耗时981ms）

准备的列标签/类型（占用0毫秒）

准备好的json（耗时27ms）

创建的上下文（耗时0毫秒）

为完整起见，计时功能如下：

def print_elapsed_time(ref_datetime, description):
    print('{} (took {}ms)'.format(description, floor((datetime.now()-ref_datetime).microseconds/1000)))
    return datetime.now()

执行处理并生成视图的代码如下：

def simpleview(request):
    time_marker = datetime.now()
    device_name = request.GET['device']
    device = Datalogger.objects.get(device_name=device_name)

    sensors = Sensor.objects.filter(datalogger=device).order_by('pk')
    sensor_count = len(sensors)  # should be no worse than count() since already-evaluated and cached.  todo: confirm

    #assign each sensor an index for the tuples (zero is used for time/x-axis)
    sensor_indices = {}
    for idx, sensor in enumerate(sensors, start=1):
        sensor_indices.update({sensor.sensor_name:idx})

    time_marker = print_elapsed_time(time_marker, 'basic gets')

    # process data into timestamp-grouped tuples accessible by sensor-index ([0] is timestamp)
    raw_data = SensorDatum.objects.filter(sensor__datalogger__device_name=device_name).order_by('timestamp', 'sensor')
    data = []
    data_idx = 0

    time_marker = print_elapsed_time(time_marker, 'queried data')

    while data_idx < len(raw_data):
        row_list = [raw_data[data_idx].timestamp]
        row_list.extend([None]*sensor_count)
        row_idx = 1

        while data_idx < len(raw_data) and raw_data[data_idx].timestamp == row_list[0]:
            row_idx = sensor_indices.get(raw_data[data_idx].sensor.sensor_name)
            row_list[row_idx] = raw_data[data_idx].value
            data_idx += 1
        data.append(tuple(row_list))

    time_marker = print_elapsed_time(time_marker, 'split data by sensor')

    column_labels = ['Time']
    column_types = ["datetime"]
    for sensor in sensors:
        column_labels.append(sensor.sensor_name)
        column_types.append("number")

    time_marker = print_elapsed_time(time_marker, 'prepared column labels/types')

    gchart_json = prepare_data_for_gchart(column_labels, column_types, data)


    time_marker = print_elapsed_time(time_marker, 'prepared json')


    context = {
        'device': device_name,
        'sensor_count': sensor_count,
        'sensor_indices': sensor_indices,
        'gchart_json': gchart_json,
    }

    time_marker = print_elapsed_time(time_marker, 'created context')

    return render(request, 'pulogger/simpleTimeSeriesView.html', context)

我是python的新手，所以我希望我在某个地方使用的操作/集合选择不多。除非我是盲人，否则它应该以O（n）运行。

显然，这不是整个问题，因为它仅占表观加载时间的一部分，但我认为这是一个不错的起点。

Answer 1

“查询数据”部分耗时0毫秒，因为该部分正在构造查询，而不是对数据库执行查询。

查询到达以下行时正在执行：while data_idx < len(raw_data):，因为要计算iterable的长度，必须对它求值。

因此，可能不是大部分时间都在循环，这可能是查询执行和评估。您可以通过将查询集包装在list()中来评估主循环之前的查询，这将使您的time_marker显示查询实际执行多长时间。

您是否需要将查询集评估为模型？或者，您可以使用.values()或.values_list()返回实际的值列表，这将跳过将查询结果序列化为Model对象的过程。通过这样做，您还避免了必须从数据库返回所有列，而仅返回所需的列。

您可以通过取消规范化模式（如果可能）以在传感器上具有device_name字段，从而可能删除此查询SensorDatum.objects.filter(sensor__datalogger__device_name=device_name).order_by('timestamp', 'sensor')中的表联接。

Answer 2

您有查询在循环下运行。您可以使用select_related预先缓存相关对象。

示例：

raw_data = SensorDatum.objects.filter(
    sensor__datalogger__device_name=device_name
).order_by(
    'timestamp',
    'sensor'
).select_related('sensor') # this will fetch and cache sensor objects and will prevent further db queries in the loop

参考：select_related Django 2.1 Docs

将此QuerySet解析为元组时，导致效率低下的原因是什么？

2 个答案: