Question

我是一名研究车辆停车行为和停车占用模式的研究员。我正在尝试编写一种有效且稳健的算法来计算随时间推移的停车占用率。

这是我的故事：

我使用澳大利亚墨尔本市内的地下停车传感器记录停车数据。您可以在此处查看数据：https://data.melbourne.vic.gov.au/Transport/Parking-Events-From-Parking-Bays-With-Sensors/8nfq-mtcn？

我的初始任务是将这些数据绘制为时间序列图，以便我可以直观地分析不同时段（日，周，月等）的停车占用趋势。

在该市的各条街道上安装了7,112个传感器。每辆传感器在汽车到达和离开停车位时记录数据（称之为“事件”）。从2011年到2012年，他们已经录制了12,208,417个活动。每个事件都是数据库中的一行，并且包含以下我感兴趣的列：

传感器ID
到达时间
出发时间
停车时间
此传感器所在的街道

现在，我不想分别绘制来自每个传感器的数据，而是在固定的时间间隔（秒，分钟，小时等）上绘制属于街道的传感器组。因此，街道“A”可能有10个停车位（= 10个传感器），街道“B”= 12个传感器，依此类推。

对于给定街道“A”之间的（10/10/2011 12:00:00 AM）至（11/10/2011 12:00:00 AM）之间的24小时时间序列图，这里是我的做了：

使用SQL

通过位于街道＆＃34; A＆＃34;的传感器检索所有事件。通过运行SQL查询

使用PHP

从（10/10/2011 12:00:00 AM）到（11/10/2011 12:00:00 AM）的时间循环迭代，每次迭代都有1分钟的偏移量。
开始解析数据：

- Foreach分钟（时间样本）：

- 街道“A”中的foreach传感器

--- foreach事件

----如果当前传感器记录了此事件并且时间样本位于汽车到达和出发时间之间，那么此分钟为占用增加+1

有关运行时间的统计信息：

SQL查询时间：~260ms
PHP执行时间：33.1s
时间样本数：1,440（即24小时循环中的分钟数）
算法必须处理的传感器数量：49
算法必须处理的事件数：508

我能够获得有关在给定时刻停放在街道上的汽车数量的信息，因此我可以使用折线图轻松绘制它。

我感觉我的算法不是非常有效/聪明。我知道，对于更大的时间段，我需要减少时间样本的数量。但是，我想知道是否有任何可能的方法来实现这一目标而不会影响时间样本？

SQL QUERY REFERENCE

SELECT sensors.device_id, events.arrival_time, events.departure_time, events.duration
FROM events, sensors
WHERE
    STR_TO_DATE(arrival_time, '%d/%m/%Y %r') >= STR_TO_DATE(:start_time,'%d/%m/%Y %r') &&
    STR_TO_DATE(arrival_time, '%d/%m/%Y %r') <= STR_TO_DATE(:end_time,'%d/%m/%Y %r') &&
    events.device_id = sensors.device_id &&
    sensors.street_name= :street_name &&
    sensors.street_1 = :street_1 &&
    sensors.street_2 = :street_2

PHP代码参考

//TIME RANGE
$start_time = "10/10/2011 12:00:00 AM";
$end_time = "11/10/2011 12:00:00 AM";

//SETUP ARRAYS FOR PLOTTING
$x_time = array();
$y_occupancy = array();

//ITERATE THROUGH TIME
for($i=strtotime($start_time); $i<=strtotime($end_time);$i+=60) {

    $current_time =  date("d/m/Y h:i:s A",$i); echo "<br>";

    $current_occupancy = 0;

    //ITERATE THROUGH SENSORS
    foreach($sensors as $sensor) {

        //ITERATIVE THROUGH EVENTS
        foreach($events as $event) {

            //CHECK IF THIS SENSOR IS ACTIVE AT THIS EVENT
            if (($sensor->device_id == $event->device_id) && (strtotime($current_time) >= strtotime($event->arrival_time) && strtotime($current_time) <= strtotime($event->departure_time))) {
                $current_occupancy++;
            }

        }//end event iterations

    }// end sensor iterations

    $x_time[] = $current_time;
    $y_occupancy[] = $current_occupancy;

}// end time iterationS



//SHOW TIME VS OCCUPANCY
for($i=0; $i<count($x_time);$i++) {
    echo $x_time[$i]; echo " "; echo $y_occupancy[$i]; echo "<br>";
}

Answer 1

struct MyIterator
{
    Iterator i;
    DateTime t;
}

struct DataPoint
{
    DateTime t;
    int count;
}

List<DataPoint> CalculateIntervalCount(List<List<Interval>> series)
{
    initialise min-heap H
    foreach (List<Interval> S in series)
    {
        Iterator i=S.begin();
        H.push(new MyIterator(i, i.ArrivalTime));
    }
    int count=0;
    List<DataPoint> result=new List<DataPoint>();
    while(!H.emtpy())
    {
        Iterator min= H.pop();
        if(min.t==min.i.ArrivalTime)
        {
            ++count;
            result.Add(new DataPoint(min.t, count));
            H.push(new MyIterator(min.i, i.DepartureTime);
        }
        else 
        {
            --count;
            result.Add(new DataPoint(min.t, count));
            if (can advance min.i)
            {
                H.push(new MyIterator(min.i, i.ArrivalTime);
            }
        }
    }
    return result;
}

说明：

CalculateIntervalCount是一个函数，它取一个区间序列列表并返回一个序列，该序列给出了时间t的区间计数。

假设你有一个＆＃34;标准＆＃34;迭代器构造到一个列表，迭代器知道如何推进自己并知道如何检查它是否到达终点，MyIterator是一个环绕＆＃34;标准＆＃34;迭代器，带有一个额外的DateTime字段，该字段可以获取间隔的ArrivalTime或DepartureTime的值。

min heap H使用MyIterator.t在两个MyIterator对象之间进行比较。

该算法只是实现了我在评论中描述的内容。它应该在O（n lg k）时间运行，其中k是区间序列的数量，n是所有序列中总间隔的数量。

是否有可能提高这种迭代算法的效率？

1 个答案: