Question

I am using this line to get entries from my Cassandra database

val  data1 =
  ssc.
    cassandraTable("orbigo2", "my_trips").
    select("trip_id").
    where ("user_id=?", uid)

But this is taking a lot of time, I guess the reason is that my uid is not a primary key but an index key.

Is there any way in which I can speed this up?

Answer 1

我只建议更改数据模型，以使<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <select id="country_code" name="country_code"> <option value="" disabled selected>Choose a Country Code</option> <option data-country="Netherlands" value="NL">NL</option> <option data-country="Germany" value="DE">DE</option> </select>  <label for="country">Country</label> <input id="country" class="" type="text" value="" name="country">  <button type="submit">Start Ticket</button>成为分区键-在这种情况下，它将变得更快...现在，它只是扫描整个表并提取必要的数据。 / p>

此外，您可能会考虑使用DataFrames而不是RDD-它具有更多的优化功能，也许您可以使用二级索引。您可以使用DataFrame上的uid检查执行计划，以了解如何访问数据...

Searching from Cassandra database via SparkStreaming takes time

1 个答案: