我有一些代码会定期更新表格。每次都应该从表中删除然后插入新记录。
问题是dse搜索有一个索引表的间隙。
这是代码:
session_statis.execute('DELETE FROM statistics WHERE source = %s', [source])
timeone = datetime.now(tz) - timedelta(hours=1)
channels_rdd = channels.map(lambda x:(x.id,{'author':x.name,'category':x.category}))
article_rdd=rdd.map(lambda x:(x[1][0]['channel'],{'source':x[1][0]['source'],'id':x[1][0]['id'],'title':x[1][0]['title'],'thumbnail':x[1][0]['thumbnail'],'url':x[1][0]['url'],'created_at':x[1][0]['created_at'],'genre':x[1][0]['genre'],'reads':0,'likes':x[1][1]['attitudes'],'comments':x[1][1]['comments'],'shares':x[1][1]['reposts'],'shares':x[1][1]['reposts']})) \
.join(channels_rdd).map(lambda x:{'source':x[1][0]['source'],'id':x[1][0]['id'],'title':x[1][0]['title'],'thumbnail':x[1][0]['thumbnail'],'url':x[1][0]['url'],'created_at':parse(x[1][0]['created_at']),'genre':x[1][0]['genre'],'reads':0,'likes':x[1][0]['likes'],'comments':x[1][0]['comments'],'shares':x[1][0]['shares'],'speed':x[1][0]['shares'],'category':x[1][1]['category'],'author':x[1][1]['author']})
result1=article_rdd.filter(lambda x:x['created_at']>=timeone).filter(lambda x:x['speed']>0).map(lambda x:{'timespan':'1','source':x['source'],'id':x['id'],'title':x['title'],'thumbnail':x['thumbnail'],'url':x['url'],'created_at':x['created_at'],'genre':x['genre'],'reads':0,'likes':x['likes'],'comments':x['comments'],'shares':x['shares'],'speed':x['shares'],'category':x['category'],'author':x['author']})
for rdd in result1.collect():
dt article xxxx
session_statis.execute('INSERT INTO statistics(source, timespan, id, title, thumbnail, url, created_at, category, genre, author, reads, likes, comments, shares, speed) values(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)', (rdd['source'],rdd['timespan'],rdd['id'],rdd['title'],rdd['thumbnail'],rdd['url'],rdd['created_at'],rdd['category'],rdd['genre'],rdd['author'],rdd['reads'],rdd['likes'],rdd['comments'],rdd['shares'],rdd['speed']))
感谢您的回复。
答案 0 :(得分:1)
根据您的使用模式,您可能不得不考虑各种一致性等级。如设置1将产生良好的结果。