Searching from Cassandra database via SparkStreaming takes time

时间:2018-07-24 10:24:18

标签: apache-spark cassandra spark-streaming spark-cassandra-connector

I am using this line to get entries from my Cassandra database

val  data1 =
  ssc.
    cassandraTable("orbigo2", "my_trips").
    select("trip_id").
    where ("user_id=?", uid)

But this is taking a lot of time, I guess the reason is that my uid is not a primary key but an index key.

Is there any way in which I can speed this up?

1 个答案:

答案 0 :(得分:0)

我只建议更改数据模型,以使<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <select id="country_code" name="country_code"> <option value="" disabled selected>Choose a Country Code</option> <option data-country="Netherlands" value="NL">NL</option> <option data-country="Germany" value="DE">DE</option> </select> <!-- Country input --> <label for="country">Country</label> <input id="country" class="" type="text" value="" name="country"> <!-- Button --> <button type="submit">Start Ticket</button>成为分区键-在这种情况下,它将变得更快...现在,它只是扫描整个表并提取必要的数据。 / p>

此外,您可能会考虑使用DataFrames而不是RDD-它具有更多的优化功能,也许您可​​以使用二级索引。您可以使用DataFrame上的uid检查执行计划,以了解如何访问数据...