Question

我一直在研究一些代码，这些代码读取制表符分隔的CSV文件，该文件代表一系列进程及其开始时间和持续时间，并使用pandas为其创建数据帧。然后，我需要应用简化的循环调度形式来查找进程的周转时间，并从用户输入中获取时间片。

到目前为止，我能够读取CSV文件，标记并正确排序。但是，当尝试构造循环以迭代行以查找每个进程时，完成时间，我卡住了。

到目前为止的代码如下：

# round robin
def rr():
    docname = (sys.argv[1])
    method = (sys.argv[2])
    # creates a variable from the user input to define timeslice
    timeslice = int(re.search(r'\d+', method).group())
    # use pandas to create a 2-d data frame from tab delimited file, set column 0 (process names) to string, set column
    # 1 & 2 (start time and duration, respectively) to integers
    d = pd.read_csv(docname, delimiter="\t", header=None, dtype={'0': str, '1': np.int32, '2': np.int32})
    # sort d into d1 by values of start times[1], ascending
    d1 = d.sort_values(by=1)
    # Create a 4th column, set to 0, for the Completion time
    d1[3] = 0
    # change column names
    d1.columns = ['Process', 'Start', 'Duration', 'Completion']
    # intialize counter
    counter = 0
    # if any values in column 'Duration' are above 0, continue the loop
    while (d1['Duration']).any() > 0:
        for index, row in d1.iterrows():
            # if value in column 'Duration' > the timeslice, add the value of the timeslice to the current counter,
            # subtract it from the the current value in column 'Duration'
            if row.Duration > timeslice:
                counter += timeslice
                row.Duration -= timeslice
                print(index, row.Duration)
            # if value in column "Duration" <= the timeslice, add the current value of the row:Duration to the counter
            # subtract the Duration from itself, to make it 0
            # set row:Completion to the current counter, which is the completion time for the process
            elif row.Duration <= timeslice and row.Duration != 0:
                counter += row.Duration
                row.Duration -= row.Duration
                row.Completion = counter
                print(index, row.Duration)
            # otherwise, if the value in Duration is already 0, print that index, with the "Done" indicator
            else:
                print(index, "Done")

鉴于示例CSV文件，d1看起来像

  Process  Start  Duration  Completion
3      p4      0       280           0
0      p1      5       140           0
1      p2     14        75           0
2      p3     36       320           0
5      p6     40         0           0
4      p5     67       125           0

当我用timeslice = 70运行我的代码时，我得到一个无限循环：

这似乎是正确迭代循环一次，然后无限重复。但是，print(d1['Completion'])给出了所有0的值，这意味着它不会将正确的counter值分配给d1['Completion']。

理想情况下，Completion值将填写相应的时间，给定timeslice=70，如：

  Process  Start  Duration  Completion
3      p4      0       280         830
0      p1      5       140         490
1      p2     14        75         495
2      p3     36       320         940
5      p6     40         0         280  
4      p5     67       125         620

然后我可以用它来查找平均周转时间。然而，出于某种原因，我的循环似乎迭代一次然后无休止地重复。当我尝试切换while和for语句的顺序时，它将重复迭代每一行，直到达到0，同时给出错误的完成时间。

提前致谢。

Answer 1

我修改了你的代码并且它有效。你实际上无法用修改后的值覆盖原始值，所以循环不会结束。

while (d1['Duration']).any() > 0:
    for index, row in d1.iterrows():
        # if value in column 'Duration' > the timeslice, add the value of the timeslice to the current counter,
        # subtract it from the the current value in column 'Duration'
        if row.Duration > timeslice:
            counter += timeslice
            #row.Duration -= timeslice
            # !!!LOOK HERE!!!
            d1['Duration'][index] -= timeslice
            print(index, row.Duration)
        # if value in column "Duration" <= the timeslice, add the current value of the row:Duration to the counter
        # subtract the Duration from itself, to make it 0
        # set row:Completion to the current counter, which is the completion time for the process
        elif row.Duration <= timeslice and row.Duration != 0:
            counter += row.Duration
            #row.Duration -= row.Duration
            #row.Completion = counter
            # !!!LOOK HERE!!!
            d1['Duration'][index] = 0
            d1['Completion'][index] = counter 
            print(index, row.Duration)
        # otherwise, if the value in Duration is already 0, print that index, with the "Done" indicator
        else:
            print(index, "Done")

顺便说一下，我想你可能想要模拟进程调度算法。在这种情况下，您必须考虑“开始”，因为并非每个流程都会同时启动。

（你理想的表格有点不对。）

用于pandas数据帧的循环调度

1 个答案: