给出一个 Pandas数据框,该数据框表示某些程序开始开始工作以及完成(即单行-单个程序)的时间:>
starts finishes
2018-01-01 12:00 2018-01-01 15:00
2018-01-01 16:00 2018-01-01 20:00
2018-01-01 16:30 2018-01-01 20:00
2018-01-01 17:00 2018-01-01 21:00
...
我需要计算表中每次表示的并发程序数 。上表如下:
time number_of_conc_progs
2018-01-01 12:00 1
2018-01-01 15:00 0
2018-01-01 16:00 1
2018-01-01 16:30 2
2018-01-01 17:00 3
2018-01-01 20:00 1
2018-01-01 21:00 0
...
如果某个程序在12:00(例如)启动,并且当前进程数为 n ,则在12:00,该数字的值为 n +1 。
如果程序在12:00(例如)完成 ,并且当前进程数是 n ,则在12:00,该数字的值是 n -1 。
答案 0 :(得分:0)
# creation of the dataframe
df = pd.DataFrame([
["2018-01-01 12:00", "2018-01-01 15:00"],
["2018-01-01 16:00", "2018-01-01 20:00"],
["2018-01-01 16:30", "2018-01-01 20:00"],
["2018-01-01 17:00", "2018-01-01 21:00"]])
df.columns = ["starts", "finishes"]
# number of progs increases of 1 for start times
starts = pd.DataFrame()
starts["time"] = df.starts
starts["number_of_conc_progs"] = 1
# number of progs decreases of 1 for finishes times
finishes = pd.DataFrame()
finishes["time"] = df.finishes
finishes["number_of_conc_progs"] = -1
# then I merge the starts and the finishes dataframes
result = pd.DataFrame()
result = pd.concat([starts,finishes])
# I sort the time values
result = result.sort_values(by=['time'])
# If there is several starts or finishes at the same time, I sum them
result = result.groupby(['time']).sum()
# I do a cumulation sum to get the actual number of progs running
result.number_of_conc_progs = result.number_of_conc_progs.cumsum()