我有一个2x720的数组。第一列是datetime,第二列是value。我的数据如下: -
[(datetime.datetime(2015,4,26,0,10),25.2),
(datetime.datetime(2015,4,26,0,20),25.1),
(datetime.datetime(2015,4,26,0,30),25.7),
(datetime.datetime(2015,4,26,0,40),23.2),
(datetime.datetime(2015,4,26,0,50),22.2),
(datetime.datetime(2015,4,26,0,60),29.2),
(datetime.datetime(2015,4,26,1,00),22.2),
(datetime.datetime(2015,4,26,1,10),21.2), ...]
所有数据都在同一天。我只是想按小时组织数据来准备烛台中的情节(只有最大,最小,不想打开,关闭)。我只想要这样的数据: -
[(datetime.datetime(2015,4,26,0,00),max in hour 0, min in hour 0),
(datetime.datetime(2015,4,26,1,00),max in hour 1, min in hour 1),
(datetime.datetime(2015,4,26,2,00),max in hour 2, min in hour 2), ...
(datetime.datetime(2015,4,26,23,00),max in hour 23, min in hour 23)]
我是一个新的Python,想要使用漂亮的短脚本。以前,我使用C ++(很久很久以前),我发现Python更多的是艺术,而不仅仅是编程。我尝试搜索答案一段时间但找不到符合我要求的答案。谢谢你的帮助。
答案 0 :(得分:0)
由于他们都是在同一天,按小时分组,然后描述这些群组。
import datetime
from collections import defaultdict
start_of_day = datetime.datetime(2015, 4, 26)
hour_to_values = defaultdict(list)
for dt, value in your_list_of_values:
hour_to_values[dt.hour].append(value)
result = [(start_of_day + datetime.timedelta(hours=hour),
min(values), max(values))
for hour, values in hour_to_values.iteritems()]
答案 1 :(得分:0)
以下假设列表已经按日期排序
output = []
current_hour = None
current_output = None
for point in data:
phour = point[0].hour
pvalue = point[1]
if phour is current_hour:
if pvalue < current_output[1]:
current_output[1] = pvalue
if pvalue > current_output[2]:
current_output[2] = pvalue
else:
current_hour = phour
output.append([point[0], pvalue, pvalue])
current_output = output[-1]
答案 2 :(得分:0)
如果您的数据是这样的:
>>> arr
[[datetime.datetime(2015, 4, 26, 0, 0), 0.9627101684867109], [datetime.datetime(2015, 4, 26, 0, 20), 0.8894632247614254], [datetime.datetime(2015, 4, 26, 0, 40), 0.1920554638586589], [datetime.datetime(2015, 4, 26, 1, 0), 0.24394390686092926], [datetime.datetime(2015, 4, 26, 1, 20), 0.9870880292994234], [datetime.datetime(2015, 4, 26, 1, 40), 0.8154734773666351], [datetime.datetime(2015, 4, 26, 2, 0), 0.5074101780070644], [datetime.datetime(2015, 4, 26, 2, 20), 0.6211085118418351], [datetime.datetime(2015, 4, 26, 2, 40), 0.1309246438480619], [datetime.datetime(2015, 4, 26, 3, 0), 0.2042948575387714], [datetime.datetime(2015, 4, 26, 3, 20), 0.90969148583095], [datetime.datetime(2015, 4, 26, 3, 40), 0.9260473796075621], [datetime.datetime(2015, 4, 26, 4, 0), 0.08180604335801178], [datetime.datetime(2015, 4, 26, 4, 20), 0.9909948477818202], [datetime.datetime(2015, 4, 26, 4, 40), 0.6306008554115328], [datetime.datetime(2015, 4, 26, 5, 0), 0.7218791510465083], [datetime.datetime(2015, 4, 26, 5, 20), 0.5751211758007434], [datetime.datetime(2015, 4, 26, 5, 40), 0.8643323785674638], [datetime.datetime(2015, 4, 26, 6, 0), 0.44366887986412196], [datetime.datetime(2015, 4, 26, 6, 20), 0.5845914793227223], [datetime.datetime(2015, 4, 26, 6, 40), 0.9816449110831348], [datetime.datetime(2015, 4, 26, 7, 0), 0.7976769524401801], [datetime.datetime(2015, 4, 26, 7, 20), 0.019715644725192494], [datetime.datetime(2015, 4, 26, 7, 40), 0.774857573501942], [datetime.datetime(2015, 4, 26, 8, 0), 0.971010849289862], [datetime.datetime(2015, 4, 26, 8, 20), 0.9854650056341737], [datetime.datetime(2015, 4, 26, 8, 40), 0.44764478642480565], [datetime.datetime(2015, 4, 26, 9, 0), 0.41757419665518836], [datetime.datetime(2015, 4, 26, 9, 20), 0.2428205990660569], [datetime.datetime(2015, 4, 26, 9, 40), 0.7652296383460859], [datetime.datetime(2015, 4, 26, 10, 0), 0.6148904798625167], [datetime.datetime(2015, 4, 26, 10, 20), 0.5437523646936837], [datetime.datetime(2015, 4, 26, 10, 40), 0.7867821039231312], [datetime.datetime(2015, 4, 26, 11, 0), 0.7178834338473005], [datetime.datetime(2015, 4, 26, 11, 20), 0.4349509857268635], [datetime.datetime(2015, 4, 26, 11, 40), 0.2819549901100772], [datetime.datetime(2015, 4, 26, 12, 0), 0.0849398640248602], [datetime.datetime(2015, 4, 26, 12, 20), 0.6260259998494316], [datetime.datetime(2015, 4, 26, 12, 40), 0.8353818765863841], [datetime.datetime(2015, 4, 26, 13, 0), 0.17232607867607763], [datetime.datetime(2015, 4, 26, 13, 20), 0.17091634151665247], [datetime.datetime(2015, 4, 26, 13, 40), 0.7653484731068122], [datetime.datetime(2015, 4, 26, 14, 0), 0.9510280942218504], [datetime.datetime(2015, 4, 26, 14, 20), 0.2696780695726898], [datetime.datetime(2015, 4, 26, 14, 40), 0.6634142333370054], [datetime.datetime(2015, 4, 26, 15, 0), 0.48395825825107863], [datetime.datetime(2015, 4, 26, 15, 20), 0.7669839652095866], [datetime.datetime(2015, 4, 26, 15, 40), 0.9479268674677883], [datetime.datetime(2015, 4, 26, 16, 0), 0.9046641495205922], [datetime.datetime(2015, 4, 26, 16, 20), 0.045289391652820865], [datetime.datetime(2015, 4, 26, 16, 40), 0.7932951067126703], [datetime.datetime(2015, 4, 26, 17, 0), 0.4419846953059643], [datetime.datetime(2015, 4, 26, 17, 20), 0.11146542138230242], [datetime.datetime(2015, 4, 26, 17, 40), 0.5887496294547572], [datetime.datetime(2015, 4, 26, 18, 0), 0.08733136331114111], [datetime.datetime(2015, 4, 26, 18, 20), 0.7957160332912587], [datetime.datetime(2015, 4, 26, 18, 40), 0.8128833057460692], [datetime.datetime(2015, 4, 26, 19, 0), 0.21977323027233342], [datetime.datetime(2015, 4, 26, 19, 20), 0.20504702851137402], [datetime.datetime(2015, 4, 26, 19, 40), 0.6555892081746738], [datetime.datetime(2015, 4, 26, 20, 0), 0.7380315441194354], [datetime.datetime(2015, 4, 26, 20, 20), 0.8075383278433004], [datetime.datetime(2015, 4, 26, 20, 40), 0.837007721004194], [datetime.datetime(2015, 4, 26, 21, 0), 0.8842141478652727], [datetime.datetime(2015, 4, 26, 21, 20), 0.3349342531521037], [datetime.datetime(2015, 4, 26, 21, 40), 0.811383235093619], [datetime.datetime(2015, 4, 26, 22, 0), 0.8273356582091318], [datetime.datetime(2015, 4, 26, 22, 20), 0.17269590855559502], [datetime.datetime(2015, 4, 26, 22, 40), 0.13561711047456493], [datetime.datetime(2015, 4, 26, 23, 0), 0.8906156794457442], [datetime.datetime(2015, 4, 26, 23, 20), 0.2653437814631542]]
(我用第二个元素的随机数据复制它)你可以按小时放入桶:
>>> buckets={}
>>> for t in arr:
... buckets.setdefault(t[0].hour, []).append(t)
然后对键进行排序,并使用第二个元组元素作为键获得最小值,最大值:
>>> for hour in sorted(buckets):
... print hour, max(buckets[hour], key=lambda l: l[1]), min(buckets[hour], key=lambda l: l[1])
0 [datetime.datetime(2015, 4, 26, 0, 0), 0.9627101684867109] [datetime.datetime(2015, 4, 26, 0, 40), 0.1920554638586589]
1 [datetime.datetime(2015, 4, 26, 1, 20), 0.9870880292994234] [datetime.datetime(2015, 4, 26, 1, 0), 0.24394390686092926]
2 [datetime.datetime(2015, 4, 26, 2, 20), 0.6211085118418351] [datetime.datetime(2015, 4, 26, 2, 40), 0.1309246438480619]
3 [datetime.datetime(2015, 4, 26, 3, 40), 0.9260473796075621] [datetime.datetime(2015, 4, 26, 3, 0), 0.2042948575387714]
4 [datetime.datetime(2015, 4, 26, 4, 20), 0.9909948477818202] [datetime.datetime(2015, 4, 26, 4, 0), 0.08180604335801178]
5 [datetime.datetime(2015, 4, 26, 5, 40), 0.8643323785674638] [datetime.datetime(2015, 4, 26, 5, 20), 0.5751211758007434]
6 [datetime.datetime(2015, 4, 26, 6, 40), 0.9816449110831348] [datetime.datetime(2015, 4, 26, 6, 0), 0.44366887986412196]
7 [datetime.datetime(2015, 4, 26, 7, 0), 0.7976769524401801] [datetime.datetime(2015, 4, 26, 7, 20), 0.019715644725192494]
8 [datetime.datetime(2015, 4, 26, 8, 20), 0.9854650056341737] [datetime.datetime(2015, 4, 26, 8, 40), 0.44764478642480565]
9 [datetime.datetime(2015, 4, 26, 9, 40), 0.7652296383460859] [datetime.datetime(2015, 4, 26, 9, 20), 0.2428205990660569]
10 [datetime.datetime(2015, 4, 26, 10, 40), 0.7867821039231312] [datetime.datetime(2015, 4, 26, 10, 20), 0.5437523646936837]
11 [datetime.datetime(2015, 4, 26, 11, 0), 0.7178834338473005] [datetime.datetime(2015, 4, 26, 11, 40), 0.2819549901100772]
12 [datetime.datetime(2015, 4, 26, 12, 40), 0.8353818765863841] [datetime.datetime(2015, 4, 26, 12, 0), 0.0849398640248602]
13 [datetime.datetime(2015, 4, 26, 13, 40), 0.7653484731068122] [datetime.datetime(2015, 4, 26, 13, 20), 0.17091634151665247]
14 [datetime.datetime(2015, 4, 26, 14, 0), 0.9510280942218504] [datetime.datetime(2015, 4, 26, 14, 20), 0.2696780695726898]
15 [datetime.datetime(2015, 4, 26, 15, 40), 0.9479268674677883] [datetime.datetime(2015, 4, 26, 15, 0), 0.48395825825107863]
16 [datetime.datetime(2015, 4, 26, 16, 0), 0.9046641495205922] [datetime.datetime(2015, 4, 26, 16, 20), 0.045289391652820865]
17 [datetime.datetime(2015, 4, 26, 17, 40), 0.5887496294547572] [datetime.datetime(2015, 4, 26, 17, 20), 0.11146542138230242]
18 [datetime.datetime(2015, 4, 26, 18, 40), 0.8128833057460692] [datetime.datetime(2015, 4, 26, 18, 0), 0.08733136331114111]
19 [datetime.datetime(2015, 4, 26, 19, 40), 0.6555892081746738] [datetime.datetime(2015, 4, 26, 19, 20), 0.20504702851137402]
20 [datetime.datetime(2015, 4, 26, 20, 40), 0.837007721004194] [datetime.datetime(2015, 4, 26, 20, 0), 0.7380315441194354]
21 [datetime.datetime(2015, 4, 26, 21, 0), 0.8842141478652727] [datetime.datetime(2015, 4, 26, 21, 20), 0.3349342531521037]
22 [datetime.datetime(2015, 4, 26, 22, 0), 0.8273356582091318] [datetime.datetime(2015, 4, 26, 22, 40), 0.13561711047456493]
23 [datetime.datetime(2015, 4, 26, 23, 0), 0.8906156794457442] [datetime.datetime(2015, 4, 26, 23, 20), 0.2653437814631542]
如果您的数据已由datetime元素按顺序排列,则可以绕过单独的存储桶步骤并使用groupby:
>>> from itertools import groupby
>>> for hour, group in groupby(arr, lambda t: t[0].hour):
... li=list(group)
... print hour, max(li, key=lambda l: l[1]), min(li, key=lambda l: l[1])
答案 3 :(得分:0)
您可以使用pandas。
import pandas as pd
创建DataFrame并及时排序
df = pd.DataFrame(d, columns = ['time', 'price']).sort('time')
其中d是您输入的元组列表。
time price 0 2015-04-26 00:10:00 25.2 1 2015-04-26 00:20:00 25.1 2 2015-04-26 00:30:00 25.7 3 2015-04-26 00:40:00 23.2 4 2015-04-26 00:50:00 22.2 5 2015-04-26 00:59:00 29.2 6 2015-04-26 01:00:00 22.2 7 2015-04-26 01:10:00 21.2
创建包含日期和小时信息的列
df['day_hour'] = df.apply(lambda r: datetime.datetime(r['time'].year, r['time'].month, r['time'].day, r['time'].hour,0), axis = 1)
time price day_hour 0 2015-04-26 00:10:00 25.2 2015-04-26 00:00:00 1 2015-04-26 00:20:00 25.1 2015-04-26 00:00:00 2 2015-04-26 00:30:00 25.7 2015-04-26 00:00:00 3 2015-04-26 00:40:00 23.2 2015-04-26 00:00:00 4 2015-04-26 00:50:00 22.2 2015-04-26 00:00:00 5 2015-04-26 00:59:00 29.2 2015-04-26 00:00:00 6 2015-04-26 01:00:00 22.2 2015-04-26 01:00:00 7 2015-04-26 01:10:00 21.2 2015-04-26 01:00:00
删除原始“时间”列,因为它未在输出中使用
df = df.drop('time', axis = 1)
按日期和小时对数据进行分组
dfgrouped = df.groupby('day_hour')
获取每个date_hour的最大/最小值
dfmax = dfgrouped.max()
dfmin = dfgrouped.min()
在同一天_hour
加入最大/分钟dfout = dfmax.join(dfmin, lsuffix='_max', rsuffix='_min')
>>> dfout price_max price_min day_hour 2015-04-26 00:00:00 29.2 22.2 2015-04-26 01:00:00 22.2 21.2