Question

我有一个存储在 cassandra 中的数据，我想在python中检索它以进行批处理。我想基于时间列创建批次。假设我在 9:00:00 和 10:00:00 之间从cassandra中提取数据。现在我想为每分钟数据创建批次。（相同的时间戳可以在很多行中）。

1st batch is of 9:00:00 data.
2nd batch is of 9:00:01 data.

这些批次将被送到其他模块，以便一个接一个地处理。

我的代码如下：

import pandas as pd
from cassandra.cluster import Cluster 

cluster_cont=['127.0.0.1']
keyspace='demo'
query='Select * from demo.table1'
cond1='9:00:00'
cond2='10:00:00'

def readD(cluster_cont, keyspace, query, cond1, cond2):

    cluster = Cluster(contact_points=cluster_cont)
    session = cluster.connect(keyspace)

    session.default_fetch_size = None

    rslt = pd.DataFrame(session.execute(query+" where time>="+cond1" and time<="+cond2, timeout=None))
    df = rslt._current_rows

return df

在上面的代码中，我想在cond1到cond2之间进行迭代，以创建批量的数据帧。我怎样才能做到这一点？

我尝试将 cond1 和 cond2 转换为datetime对象，但它会出错。

df=readD(cluster_cont, keyspace, query, cond1, cond2)
cond1=datetime.datetime.strptime(cond1,'%H:%M:%S').time()
cond2=datetime.datetime.strptime(cond2,'%H:%M:%S').time()

while cond1 <= cond2 :
   cond1 = cond1 + datetime.timedelta(minutes=1)
   df=df[df['time']==cond1]

回溯

TypeError: unsupported operand type(s) for +: 'datetime.time' and 'datetime.timedelta'

即使我设法迭代一次，我如何逐个将这些数据帧提供给其他模块？我应该改变方法吗？

欢迎任何建议。感谢。

P.S：我正在使用python-cassandra驱动程序来提取数据。

如何在python中创建批处理批处理？

0 个答案: