Question

我是PredictionIO V 0.12.0的新手（elasticsearch - 5.2.1，hbase - 1.2.6，spark - 2.6.0）硬件（244 GB RAM和Core - 32）。我上传了大约100万个事件（每个事件包含30k功能）。上传时我可以看到hbase磁盘的大小增加，上传所有事件后，hbase磁盘的大小为567GB。为了验证我运行了以下命令

 - pio-shell --with-spark --conf spark.network.timeout=10000000 --driver-memory 30G --executor-memory 21G --num-executors 7 --executor-cores 3 --conf spark.driver.maxResultSize=4g --conf spark.executor.heartbeatInterval=10000000
 - import org.apache.predictionio.data.store.PEventStore
 - val eventsRDD = PEventStore.find(appName="test")(sc)
 - val c = eventsRDD.count()

它将事件计数显示为18944

在我上传事件的脚本之后，我随机查询了事件ID并且我正在接收该事件。

我不知道如何确保我上传的所有活动都在应用中。任何帮助表示赞赏。

Answer 1

最后我弄清楚发生了什么

org.apache.predictionio.data.storage.hbase.HBPEvents

val scan = HBEventsUtil.createScan(
    startTime = startTime,
    untilTime = untilTime,
    entityType = entityType,
    entityId = entityId,
    eventNames = eventNames,
    targetEntityType = targetEntityType,
    targetEntityId = targetEntityId,
    reversed = None)
scan.setCaching(500) // TODO
scan.setCacheBlocks(false) // TODO

scan.setCaching（500）可能导致请求超时。您可以尝试为此降低缓存值。您需要更改源代码并重新编译。

预测中的事件总数显示少于实际事件

1 个答案: