Question

我有以下DataFrame，其中每行代表自行车出租：

（持续时间以秒为单位）

我对熊猫和大数据都很陌生。我正在尝试找到当前使用最多自行车的特定时间*以及最大值。

时间是具有小时和分钟精度的日期。 *

租金的持续时间从60秒到17270400秒（199天）不等

数据框的总行数为67.000。

我知道解决方案可能非常简单，但我一直在思考和寻找一段时间，我一直坚持这一点。

以下是.csv的一些数据（从文件的顶部，中间和末尾选择一些记录，以获得数据中的一点点多样性）

http://pastebin.com/Tgnupe7K

编辑：添加了一些来自.csv

的原始数据的pastebin

Answer 1

这里的想法是考虑每辆自行车进入和退出使用的时间，表示输入使用为+1并退出使用为-1。拿这些时间，对它们进行排序，然后在+ 1 / -1的累计和。累积和的最大值将给出在给定时间输出的最大自行车数量。

我将使用我为自己的例子嘲笑的一些数据：

# Setup some fake data.
np.random.seed([3, 1415])
n = 67
df = pd.DataFrame({
    'start_date': np.random.choice(pd.date_range('2016-01-01', periods=10), size=n),
    'duration': np.random.randint(1, 10**5, size=n)
})
df['start_date'] += pd.to_timedelta(np.random.randint(1000, size=n), unit='m')

然后程序如下：

# Combine the entrance and exit times with the appropriate sign.
bike_times = pd.concat([
    pd.Series(1, index=df['start_date']),
    pd.Series(-1, index=df['start_date'] + pd.to_timedelta(df['duration'], unit='s')),
])

# Sort the dates and take the cumulative sum of the signs.
bike_times = bike_times.sort_index().cumsum()

# Find the max time and number of bikes.
max_dt = bike_times.idxmax()
max_bikes = bike_times.max()

在上面的代码中，max_dt将产生自行车最大值的起始时间。要查找结束时间，只需查看bike_times中的下一个索引值。

找到行数最多的时间间隔

1 个答案: