我努力调整火花和卡桑德拉。 我在cassandra中有1000万个数据,我正在使用spark-cassandra-connector执行像火花/直线读取操作。但需要15-20分钟。 我有4个节点的cassandra和3个节点的火花。 这是我的cassandra和spark配置。
卡桑德拉:
listen_address: 192.168.xx.xx
rpc_address: 192.168.xx.xx
endpoint_snitch: GossipingPropertyFileSnitch
auto_bootstrap: true
start_rpc: true
read_request_timeout_in_ms: 5000
write_request_timeout_in_ms: 2000
batch_size_warn_threshold_in_kb: 100
batch_size_fail_threshold_in_kb: 1000
authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer
request_timeout_in_ms: 300000
range_request_timeout_in_ms: 360000
火花:
spark.master spark://master:7077
spark.cassandra.connection.host
192.168.xx.xx,192.168.xx.xx,192.168.xx.xx,192.168.xx.xx
spark.cassandra.connection.port 9042
spark.cassandra.auth.username cassandra
spark.cassandra.auth.password cassandra
spark.driver.memory 5g
spark.executor.memory 6g
spark.cassandra.input.consistency.level QUORUM
spark.eventLog.enabled true
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.cassandra.input.split.size_in_mb 128
spark.cassandra.input.fetch.size_in_rows 10000
spark.sql.qubole.split.computation true
spark.sql.inmemorycolumnarstorage.compressed true