我正在使用控制器'对于连续累积数据但仅使用定义为少于3天的最新数据的数据库。一旦数据超过3天,我就想将其转储到JSON文件并将其从数据库中删除。
为了模拟这一点,我做了以下事情。 '控制器'程序rethinkdb_monitor.py
是
import json
import rethinkdb as r
import pytz
from datetime import datetime, timedelta
# The database and table are assumed to have been previously created
database_name = "sensor_db"
table_name = "sensor_data"
port_offset = 1 # To avoid interference of this testing program with the main program, all ports are initialized at an offset of 1 from the default ports using "rethinkdb --port_offset 1" at the command line.
conn = r.connect("localhost", 28015 + port_offset)
current_time = datetime.utcnow().replace(tzinfo=pytz.utc) # Current time include timezone (assumed UTC)
retention_period = timedelta(days=3) # Period of time during which data is retained on the main server
expiry_time = current_time - retention_period # Age of data which is removed from the main server
data_to_archive = r.db(database_name).table(table_name).filter(r.row['timestamp'] < expiry_time)
output_file = "archived_sensor_data.json"
with open(output_file, 'a') as f:
for change in data_to_archive.changes().run(conn, time_format="raw"): # The time_format="raw" option is passed to prevent a "RqlTzinfo object is not JSON serializable" error when dumping
print change
json.dump(change['new_val'], f) # Since the main database we are reading from is append-only, the 'old_val' of the change is always None and we are interested in the 'new_val' only
f.write("\n") # Separate entries by a new line
在运行此程序之前,我使用
启动了RethinkDBrethinkdb --port_offset 1
在命令行,并使用localhost:8081
处的Web界面创建一个名为sensor_db
的数据库,其中包含一个名为sensor_data
的表(见下文)。
运行rethinkdb_monitor.py
并等待更改后,我运行一个生成合成数据的脚本rethinkdb_add_data.py
:
import random
import faker
from datetime import datetime, timedelta
import pytz
import rethinkdb as r
class RandomData(object):
def __init__(self, seed=None):
self._seed = seed
self._random = random.Random()
self._random.seed(seed)
self.fake = faker.Faker()
self.fake.random.seed(seed)
def __getattr__(self, x):
return getattr(self._random, x)
def name(self):
return self.fake.name()
def datetime(self, start=None, end=None):
if start is None:
start = datetime(2000, 1, 1, tzinfo=pytz.utc) # Jan 1st 2000
if end is None:
end = datetime.utcnow().replace(tzinfo=pytz.utc)
if isinstance(end, datetime):
dt = end - start
elif isinstance(end, timedelta):
dt = end
assert isinstance(dt, timedelta)
random_dt = timedelta(microseconds=self._random.randrange(int(dt.total_seconds() * (10 ** 6))))
return start + random_dt
# Rethinkdb has been started at a port offset of 1 using the "--port_offset 1" argument.
port_offset = 1
conn = r.connect("localhost", 28015 + port_offset).repl()
rd = RandomData(seed=0) # Instantiate and seed a random data generator
# The database and table have been previously created (e.g. through the web interface at localhost:8081)
database_name = "sensor_db"
table_name = "sensor_data"
# Generate random data with timestamps uniformly distributed over the past 6 days
random_data_time_interval = timedelta(days=6)
start_random_data = datetime.utcnow().replace(tzinfo=pytz.utc) - random_data_time_interval
for _ in range(5):
entry = {"name": rd.name(), "timestamp": rd.datetime(start=start_random_data)}
r.db(database_name).table(table_name).insert(entry).run()
使用Cntrl + C中断rethinkdb_monitor.py
后,archived_sensor_data.json
文件包含要归档的数据:
{"timestamp": {"timezone": "+00:00", "$reql_type$": "TIME", "epoch_time": 1475963599.347}, "id": "be2b5fd7-28df-48ee-b744-99856643265a", "name": "Elizabeth Woods"}
{"timestamp": {"timezone": "+00:00", "$reql_type$": "TIME", "epoch_time": 1475879797.486}, "id": "36d69236-f710-481b-82b6-4a62a1aae36c", "name": "Susan Wagner"}
然而,我仍在努力的是如何随后从数据库中删除此数据。 delete
的命令语法似乎可以在表或选择上调用,但通过changefeed获得的change
只是一个字典。
如何使用changefeed连续删除数据库中的数据?
答案 0 :(得分:0)
我使用了这样一个事实:每个change
都包含数据库中相应文档的ID,并使用get
创建了一个带有此ID的选择:
with open(output_file, 'a') as f:
for change in data_to_archive.changes().run(conn, time_format="raw"): # The time_format="raw" option is passed to prevent a "RqlTzinfo object is not JSON serializable" error when dumping
print change
if change['new_val'] is not None: # If the change is not a deletion
json.dump(change['new_val'], f) # Since the main database we are reading from is append-only, the 'old_val' of the change is always None and we are interested in the 'new_val' only
f.write("\n") # Separate entries by a new line
ID_to_delete = change['new_val']['id'] # Get the ID of the data to be deleted from the database
r.db(database_name).table(table_name).get(ID_to_delete).delete().run(conn)
删除操作本身会被注册为更改,但我已使用if change['new_val'] is not None
语句对其进行过滤。