Deleting events from Prediction-IO App

时间:2018-01-11 08:36:46

标签: hadoop hbase predictionio

We are using Hbase, Hadoop as event stores for our universal recommender apps which uses PredictionIO internally. The data has grown very large and after much thought, we think it would be better to delete data which is older than 6 months. (Adding another machine as a data node is totally out of question).

After looking through multiple times, the only way I see to delete events is by querying the event-server, getting eventIDs and calling delete request for each of those eventIDs.

The problem is at random times, the event-server responds with Internal Server Error, because of which the deleting gets stopped. When I hit the same query in Postman, it sometimes responds with events and sometimes with The server was not able to produce a timely response to your request. To confirm if actually, no events are present, I checked in Hbase. There are events older than the ones for which I ask in the query.

The query is as follows: http://server:7070/events.json?accessKey=key&entityType=user&event=add_item&untilTime=2017-05-01T00:00:00.000Z&limit=2

Need help regrading how I can delete events in such a case.

1 个答案:

答案 0 :(得分:1)

从您的问题中,我可以理解您最终想要删除6个月前的数据。我建议采用干净,自动的清理数据方式,使用 HBase TTL

可以为columnFamily设置TTL。在为您的columnFamily设置6个月的TTL时,Hbase主要压缩将负责在6个月后删除这些记录。

http://hbase.apache.org/0.94/book/ttl.html