具有局限性的调度算法

时间:2013-12-30 04:31:25

标签: algorithm

感谢用户3125280,D.W。和Evgeny Kluev问题已更新。

我有一个网页列表,我必须经常下载它们,每个网页都有不同的下载频率。基于此频率,我们将网页分为5组:

Items in group 1 are downloaded once per 1 hour
items in group 2 once per 2 hours
items in group 3 once per 4 hours
items in group 4 once per 12 hours
items in group 5 once per 24 hours

这意味着,我们必须在1小时内下载所有第1组网页,在2小时内完成所有第2组网页等。

我正在尝试制作算法。作为输入,我有:

a)DATA_ARR =一个包含5个数字的数组。每个数字代表该组中的项目数。

b)TIME_ARR =一个包含5个数字(1,2,4,12,24)的数组,表示项目下载的频率。

b)X =每小时下载的网页总数。这是使用items_in_group / download_frequently并向上舍入计算的。 If we have 15 items in group 5, and 3 items in group 4, this will be 15/24 + 3/12 = 0.875 and rounded is 1.

我的程序每小时必须在最多X个网站下载。我希望算法能输出如下内容:

Hour 1: A1 B0 C4 D5
Hour 2: A2 B1 C2 D2
...

A1 =第1组的第2项
C0 =第3组的第1项

我的算法必须尽可能高效。这意味着:

a)模式必须可扩展至少200+小时
b)无需创建可重复的模式
c)尽可能需要以便使用绝对最小带宽
d)永远不会比更新频率更频繁地下载项目,没有例外


示例:

group 1: 0 items | once per 1 hour
group 2: 3 items | once per 2 hours
group 3: 4 items | once per 4 hours
group 4: 0 items | once per 12 hours
group 5: 0 items | once per 24 hours

我们计算每小时可以拍摄的物品数量:3/2+4/4 = 2.5. We round this upwards and it's 3.

使用铅笔和纸,我们可以找到以下解决方案:

Hour 1: B0 C0 B1
Hour 2: B2 C1 c2
Hour 3: B0 C3 B1
Hour 4: B2
Hour 5: B0 C0 B1
Hour 6: B2 C1 c2
Hour 7: B0 C3 B1
Hour 8: B2
Hour 9: B0 C0 B1
Hour 10: B2 C1 c2
Hour 11: B0 C3 B1
Hour 12: B2
Hour 13: B0 C0 B1
Hour 14: B2 C1 c2
and continue the above.

我们每4小时一次C0C1 C2C3。我们还每2小时一次B0B1B2


问题:请向我解释,如何设计一个能够下载项目的算法,同时使用绝对最小下载次数?蛮力 a解决方案和算法必须是高效的CPU,因为元素的数量可能很大。

您可以阅读此处发布的答案:https://cs.stackexchange.com/a/19422/12497以及user3125280发布的答案。

2 个答案:

答案 0 :(得分:11)

您的问题是典型的调度问题。这些问题在计算机科学中得到了很好的研究,因此需要咨询大量文献。

代码有点像Deficit round robin,但有一些简化。首先,我们通过添加data_to_process变量来自己提供队列。其次,队列只是迭代一系列值。

一个区别是,除了数学错误外,此解决方案将获得您想要的最佳值。

粗略草图:未编译(c ++ 11)基于unix的规范代码

#include <iostream>
#include <vector>
#include <numeric>
#include <unistd.h>
//#include <cmath> //for ceil

#define TIME_SCALE ((double)60.0) //1 for realtime speed

//Assuming you are not refreshing ints in the real case
template<typename T>
struct queue
{
    const std::vector<T> data; //this will be filled with numbers
    int position;

    double refresh_rate; //must be refreshed ever ~ hours
    double data_rate; //this many refreshes per hour
    double credit; //amount of refreshes owed

    queue(std::initializer_list<T> v, int r ) :
        data(v), position(0), refresh_rate(r), credit(0) {
        data_rate = data.size() / (double) refresh_rate;
    }

    int getNext() {
        return data[position++ % data.size()];
    }
};

double time_passed(){
static double total;
//if(total < 20){ //stop early
    usleep(60000000 / TIME_SCALE); //sleep for a minute
    total += 1.0 / 60.0; //add a minute
    std::cout << "Time: " << total << std::endl;
    return 1.0; //change to 1.0 / 60.0 for real time speed
//} else return 0;
}

int main()
{
    //keep a list of the queues
    std::vector<queue<int> > queues{
    {{1, 2, 3}, 2},
    {{1, 2, 3, 4}, 3}};

    double total_data_rate = 0;
    for(auto q : queues) total_data_rate += q.data_rate;

    double data_to_process = 0; //how many refreshes we have to do
    int queue_number = 0; //which queue we are processing

    auto current_queue = &queues[0];

    while(1) {
        data_to_process += time_passed() * total_data_rate;
        //data_to_process = ceil(data_to_process) //optional

        while(data_to_process >= 1){
            //data_to_process >= 0 will make the the scheduler more
            //eager in the first time period (ie. everything will updated correctly
            //in the first period and and following periods
            if(current_queue->credit >= 1){
            //don't change here though, since credit determines the weighting only,
            //not how many refreshes are made
                //refresh(current_queue.getNext();
                std::cout << "From queue " << queue_number << " refreshed " <<
                current_queue->getNext() << std::endl;
                current_queue->credit -= 1;
                data_to_process -= 1;
            } else {
                queue_number = (queue_number + 1) % queues.size();
                current_queue = &queues[queue_number];
                current_queue->credit += current_queue->data_rate;
            }
        }
    }
   return 0;
}

该示例现在应该在gcc上使用--std = c ++ 11进行编译,并为您提供所需的内容。

这里是测试用例输出:(对于非时间缩放的早期代码)

Time: 0
From queue 1 refreshed 1
From queue 0 refreshed 1
From queue 1 refreshed 2
Time: 1
From queue 0 refreshed 2
From queue 0 refreshed 3
From queue 1 refreshed 3
Time: 2
From queue 0 refreshed 1
From queue 1 refreshed 4
From queue 1 refreshed 1
Time: 3
From queue 0 refreshed 2
From queue 0 refreshed 3
From queue 1 refreshed 2
Time: 4
From queue 0 refreshed 1
From queue 1 refreshed 3
From queue 0 refreshed 2
Time: 5
From queue 0 refreshed 3
From queue 1 refreshed 4
From queue 1 refreshed 1

作为扩展,通过允许此调度程序仅完成第一个lcm(update_rate * lcm(...刷新率...),ceil(update_rate))步骤,然后重复模式来回答重复模式问题

另外:由于小时边界的要求,这确实无法解决。当我使用您无法解析的示例,并修改time_passed以返回0.1时,计划将每1.1小时更新一次(仅限于小时边界!)。

答案 1 :(得分:1)

看起来你的约束已到处都是。快速总结我的另一个答案:

  • 仅满足平均刷新率
  • 以完成上述
  • 所需的小时间隔执行的下载次数最少

它基于这些(有时是无法补偿的)约束

  1. 离散更新,间隔1小时
  2. 每次更新最少的项目
  3. 以固定间隔更新每个项目
  4. 并打破了3。

    由于每小时间隔和每次最少时间限制都不是必需的,我会在这里给出一个更简单,更好的答案,它会打破2。

    #include <iostream>
    #include <vector>
    #include <numeric>
    #include <unistd.h>
    
    #define TIME_SCALE ((double)60.0)
    
    //Assuming you are not refreshing ints in the real case
    template<typename T>
    struct queue
    {
        const std::vector<T> data; //this is the data to refresh
        int position; //this is the data we are up to
        double refresh_rate; //must be refreshed every this many hours
        double data_rate; //this many refreshes per hour
        double credit; //is owed this many refreshes
        const char* name;//a name for each queue
    
        queue(std::initializer_list<T> v, int r, const char* n ) :
            data(v), position(0), refresh_rate(r), credit(0), name(n) {
            data_rate = data.size() / (double) refresh_rate;
        }
    
        void refresh() {
            std::cout << "From queue " << name << " refreshed " << data[position++ % data.size()] << "\n";
        }
    };
    
    double time_passed(){
    static double total;
        usleep(60000000 / TIME_SCALE); //sleep for a minute
        total += 1.0; //add a minute
        std::cout << "Time: " << total << std::endl;
        return 1.0; //change to 1.0 / 60.0 for real time speed
    }
    
    int main()
    {
        //keep a list of the queues
        std::vector<queue<int> > queues{
            {{1}, 1, "A"},
            {{1}, 2, "B"}};
    
        while(1) {
            auto t = time_passed();
            for(queue<int>& q : queues) {
                q.credit += q.data_rate * t;
                while(q.credit >= 1){
                    q.refresh();
                    q.credit -= 1.0;
                }
            }
        }
       return 0;
    }
    

    但是,它有可能在同一时间安排许多刷新。还有第三个选项,它打破了小时间隔规则并且一次只更新一个。

    我认为这是最简单的,需要最少的更新次数(如上一个答案),但不会违反规则3.