我将InfluxDB用作时间序列数据库。使用这种类型的基础架构非常好。但是,我遇到了一个烦人的问题,我不知道该如何解决。当数据库的精度低于第二精度时,似乎很难查询它,因为时间变化有些偏斜。最初,我要求以0.5秒的精度进行写入,但是我在数据库中没有得到确切的精度。
> select price from TSLA_0p5s limit 100
name: TSLA_0p5s
time midprice
---- --------
2015-07-15T09:00:00Z 267.1
2015-07-15T09:00:00.499500032Z 267.1
2015-07-15T09:00:01Z 267.1
2015-07-15T09:00:01.499500032Z 267.1
2015-07-15T09:00:02Z 267.1
2015-07-15T09:00:02.499500032Z 267.1
2015-07-15T09:00:03Z 267.1
2015-07-15T09:00:03.499500032Z 267.1
2015-07-15T09:00:04Z 267.1
2015-07-15T09:00:04.499500032Z 267.1
2015-07-15T09:00:05Z 267.1
2015-07-15T09:00:05.499500032Z 267.1
2015-07-15T09:00:06Z 267.1
2015-07-15T09:00:06.499500032Z 267.1
2015-07-15T09:00:07Z 267.1
2015-07-15T09:00:07.499500032Z 267.1
2015-07-15T09:00:08Z 267.1
2015-07-15T09:00:08.499500032Z 267.1
2015-07-15T09:00:09Z 267.1
2015-07-15T09:00:09.499500032Z 267.1
2015-07-15T09:00:10Z 267.1
2015-07-15T09:00:10.499500032Z 267.1
2015-07-15T09:00:11Z 267.1
2015-07-15T09:00:11.499500032Z 267.1
2015-07-15T09:00:12Z 267.1
2015-07-15T09:00:12.499500032Z 267.1
在上面的数据库示例中,您可以看到时间戳之间的变化不是规则的。当我使用influxdb-python将数据写入数据库时,timedelta是唯一的,并且设置为0.5秒。在这里您可以注意到
2015-07-15T09:00:00.499500032Z-2015-07-15T09:00:00Z = 0.49950032秒(**)
和
2015-07-15T09:00:01Z-2015-07-15T09:00:00.499500032Z = 0.50049968 秒(***)
from influxdb import InfluxDBClient
client = InfluxDBClient("localhost", 8086, username, password, "data")
delta_intraday = timedelta(seconds=0.5)
current_time = datetime.datetime.strptime(start_time, "%Y-%m-%d %H:%M:%S")
next_time = current_time + delta_intraday
start_time = datetime.datetime.strptime(start_time, "%Y-%m-%d %H:%M:%S")
end_time = datetime.datetime.strptime(end_time, "%Y-%m-%d %H:%M:%S")
def generateDataFromDb():
while next_time < end_time:
fetch_items = client.query(
"select * from "
+ DB_NAME
+ " WHERE time >= '"
+ current_time.isoformat().replace("T", " ")
+ "' AND time <= '"
+ next_time.isoformat().replace("T", " ")
+ "';"
)
fetch_points = fetch_items.get_points()
data = []
data.extend(ts_fetch_items_gen(fetch_points))
data = np.array(data)
data = ts_extract(data, keys)
yield np.array(data)
current_time = next_time
next_time = next_time + delta_intraday
dataGenerator = generateDataFromDb(FROM_DAY, TO_DAY, delta=1)
for i, data in enumerate(dataGenerator):
print("{}- The datashape is {}".format(i, data.shape))
这里不用理会ts_fetch_items_gen()
和ts_extract()
。
以上代码的输出为
0- The datashape is (2, 103)
1- The datashape is (103,)
2- The datashape is (2, 103)
3- The datashape is (103,)
4- The datashape is (2, 103)
5- The datashape is (103,)
6- The datashape is (2, 103)
7- The datashape is (103,)
8- The datashape is (2, 103)
9- The datashape is (103,)
10- The datashape is (2, 103)
11- The datashape is (103,)
12- The datashape is (2, 103)
13- The datashape is (103,)
14- The datashape is (2, 103)
15- The datashape is (103,)
16- The datashape is (2, 103)
17- The datashape is (103,)
18- The datashape is (2, 103)
19- The datashape is (103,)
...
由于(**)
和(***)
,我在上面的输出中得到了两个不同的数据形状,即(103,)
和(2, 103)
。
在查询数据库之前,是否可以将时间戳四舍五入到最接近的十分之一,即0.49950032 --> 0.5
?