如何制作直方图并从以下条形输出中找到90%和95%的值:
bars = ticks.Volume.resample('1s', how=sum)
bars = bars.dropna()
bars
Timestamp
2015-12-27 23:00:25 1.0
2015-12-27 23:01:11 10.0
2015-12-27 23:02:03 1.0
2015-12-27 23:02:14 2.0
2015-12-27 23:07:27 1.0
2015-12-27 23:14:58 2.0
2015-12-27 23:17:45 1.0
2015-12-27 23:21:38 1.0
2015-12-27 23:37:29 2.0
2015-12-27 23:37:32 1.0
2015-12-27 23:47:35 2.0
2015-12-27 23:47:38 12.0
2015-12-28 00:18:48 1.0
2015-12-28 00:26:19 1.0
2015-12-28 00:42:52 4.0
2015-12-28 01:25:52 1.0
2015-12-28 01:38:52 4.0
2015-12-28 02:03:47 4.0
2015-12-28 02:04:25 4.0
2015-12-28 02:39:15 3.0
2015-12-28 02:54:11 5.0
2015-12-28 03:07:43 1.0
2015-12-28 03:20:04 1.0
2015-12-28 03:30:00 6.0
2015-12-28 03:42:16 1.0
2015-12-28 04:11:03 6.0
2015-12-28 05:13:37 1.0
2015-12-28 05:15:20 1.0
2015-12-28 05:45:51 2.0
2015-12-28 05:48:14 29.0
另外,我如何将此限制为仅限09:30 - 16:15?我应该使用groupby吗?如果是,请告诉我们如何做到这一点?
感谢
答案 0 :(得分:2)
使用Numpy的histogram
和percentile
方法可以轻松完成这些任务。
但首先我们通过首先将索引转换为datetime对象来按时间过滤。在下面的示例中,我更改了目标时间,以在示例数据框中包含观察值。
import numpy as np
import pandas as pd
#EDIT: added code to rename a column
##
# Rename column
##
bars.columns # check the original column names
>>>Index([u'Unnamed: 1'], dtype='object')
# rename the 'Unnamed: 1' column
bars.rename(columns={'Unnamed: 1': 'Value'}, inplace=True)
bars.columns
>>>Index([u'Value'], dtype='object')
##
# Filter by time of day
##
# Convert to a datetime.
# WARNING this is operation is very expensive. For very large dataframes, it is much faster
# to use keep the indices as text and use a different filtering function.
bars.index = bars.index.to_datetime()
# Changed the target times to include values in the sample df
start = (2, 30)
end = (5, 15)
# Filter to only keeps times of day that fall within the desired times
idx = pd.Series(bars.index).apply(lambda x: x.replace(hour=start[0], minute=start[1]) < x < x.replace(hour=end[0], minute=end[1])).values
bars_filtered = bars[idx]
bars_filtered
Value
2015-12-28 02:03:47 4.0
2015-12-28 02:04:25 4.0
2015-12-28 02:39:15 3.0
2015-12-28 02:54:11 5.0
2015-12-28 03:07:43 1.0
2015-12-28 03:20:04 1.0
2015-12-28 03:30:00 6.0
2015-12-28 03:42:16 1.0
2015-12-28 04:11:03 6.0
计算直方图和百分位非常容易。
##
# Histograms and Percentiles
##
# Get the histograms
num_bins = 10
hist, edges = np.histogram(bars.Value, bins=num_bins)
hist
array([20, 7, 0, 2, 0, 0, 0, 0, 0, 1])
# Edges defining the histogram bins
edges
array([ 1. , 3.8, 6.6, 9.4, 12.2, 15. , 17.8, 20.6, 23.4,
26.2, 29. ])
# Calculate the percentiles
p_90 = np.percentile(bars_filtered.Value, q=90)
p_95 = np.percentile(bars_filtered.Value, q=95)
p_90
6.0
p_95
6.0