用于pandas数据帧的循环调度

时间:2016-10-06 03:50:06

标签: python python-3.x csv pandas

我一直在研究一些代码,这些代码读取制表符分隔的CSV文件,该文件代表一系列进程及其开始时间和持续时间,并使用pandas为其创建数据帧。然后,我需要应用简化的循环调度形式来查找进程的周转时间,并从用户输入中获取时间片。

到目前为止,我能够读取CSV文件,标记并正确排序。但是,当尝试构造循环以迭代行以查找每个进程时,完成时间,我卡住了。

到目前为止的代码如下:

# round robin
def rr():
    docname = (sys.argv[1])
    method = (sys.argv[2])
    # creates a variable from the user input to define timeslice
    timeslice = int(re.search(r'\d+', method).group())
    # use pandas to create a 2-d data frame from tab delimited file, set column 0 (process names) to string, set column
    # 1 & 2 (start time and duration, respectively) to integers
    d = pd.read_csv(docname, delimiter="\t", header=None, dtype={'0': str, '1': np.int32, '2': np.int32})
    # sort d into d1 by values of start times[1], ascending
    d1 = d.sort_values(by=1)
    # Create a 4th column, set to 0, for the Completion time
    d1[3] = 0
    # change column names
    d1.columns = ['Process', 'Start', 'Duration', 'Completion']
    # intialize counter
    counter = 0
    # if any values in column 'Duration' are above 0, continue the loop
    while (d1['Duration']).any() > 0:
        for index, row in d1.iterrows():
            # if value in column 'Duration' > the timeslice, add the value of the timeslice to the current counter,
            # subtract it from the the current value in column 'Duration'
            if row.Duration > timeslice:
                counter += timeslice
                row.Duration -= timeslice
                print(index, row.Duration)
            # if value in column "Duration" <= the timeslice, add the current value of the row:Duration to the counter
            # subtract the Duration from itself, to make it 0
            # set row:Completion to the current counter, which is the completion time for the process
            elif row.Duration <= timeslice and row.Duration != 0:
                counter += row.Duration
                row.Duration -= row.Duration
                row.Completion = counter
                print(index, row.Duration)
            # otherwise, if the value in Duration is already 0, print that index, with the "Done" indicator
            else:
                print(index, "Done")

鉴于示例CSV文件,d1看起来像

  Process  Start  Duration  Completion
3      p4      0       280           0
0      p1      5       140           0
1      p2     14        75           0
2      p3     36       320           0
5      p6     40         0           0
4      p5     67       125           0

当我用timeslice = 70运行我的代码时,我得到一个无限循环:

3 210
0 70
1 5
2 250
5 Done
4 55
3 210
0 70
1 5
2 250
5 Done
4 55

这似乎是正确迭代循环一次,然后无限重复。但是,print(d1['Completion'])给出了所有0的值,这意味着它不会将正确的counter值分配给d1['Completion']

理想情况下,Completion值将填写相应的时间,给定timeslice=70,如:

  Process  Start  Duration  Completion
3      p4      0       280         830
0      p1      5       140         490
1      p2     14        75         495
2      p3     36       320         940
5      p6     40         0         280  
4      p5     67       125         620

然后我可以用它来查找平均周转时间。然而,出于某种原因,我的循环似乎迭代一次然后无休止地重复。当我尝试切换whilefor语句的顺序时,它将重复迭代每一行,直到达到0,同时给出错误的完成时间。

提前致谢。

1 个答案:

答案 0 :(得分:0)

我修改了你的代码并且它有效。你实际上无法用修改后的值覆盖原始值,所以循环不会结束。

while (d1['Duration']).any() > 0:
    for index, row in d1.iterrows():
        # if value in column 'Duration' > the timeslice, add the value of the timeslice to the current counter,
        # subtract it from the the current value in column 'Duration'
        if row.Duration > timeslice:
            counter += timeslice
            #row.Duration -= timeslice
            # !!!LOOK HERE!!!
            d1['Duration'][index] -= timeslice
            print(index, row.Duration)
        # if value in column "Duration" <= the timeslice, add the current value of the row:Duration to the counter
        # subtract the Duration from itself, to make it 0
        # set row:Completion to the current counter, which is the completion time for the process
        elif row.Duration <= timeslice and row.Duration != 0:
            counter += row.Duration
            #row.Duration -= row.Duration
            #row.Completion = counter
            # !!!LOOK HERE!!!
            d1['Duration'][index] = 0
            d1['Completion'][index] = counter 
            print(index, row.Duration)
        # otherwise, if the value in Duration is already 0, print that index, with the "Done" indicator
        else:
            print(index, "Done")

顺便说一下,我想你可能想要模拟进程调度算法。在这种情况下,您必须考虑“开始”,因为并非每个流程都会同时启动。

(你理想的表格有点不对。)