火花和cassandra的性能调整

时间:2017-08-08 06:22:18

标签: apache-spark cassandra spark-cassandra-connector

我努力调整火花和卡桑德拉。 我在cassandra中有1000万个数据,我正在使用spark-cassandra-connector执行像火花/直线读取操作。但需要15-20分钟。 我有4个节点的cassandra和3个节点的火花。 这是我的cassandra和spark配置。

卡桑德拉:

listen_address: 192.168.xx.xx rpc_address: 192.168.xx.xx endpoint_snitch: GossipingPropertyFileSnitch auto_bootstrap: true start_rpc: true read_request_timeout_in_ms: 5000 write_request_timeout_in_ms: 2000 batch_size_warn_threshold_in_kb: 100 batch_size_fail_threshold_in_kb: 1000 authenticator: PasswordAuthenticator authorizer: CassandraAuthorizer request_timeout_in_ms: 300000 range_request_timeout_in_ms: 360000

火花:

spark.master spark://master:7077 spark.cassandra.connection.host 192.168.xx.xx,192.168.xx.xx,192.168.xx.xx,192.168.xx.xx spark.cassandra.connection.port 9042 spark.cassandra.auth.username cassandra spark.cassandra.auth.password cassandra spark.driver.memory 5g spark.executor.memory 6g spark.cassandra.input.consistency.level QUORUM spark.eventLog.enabled true spark.serializer org.apache.spark.serializer.KryoSerializer spark.cassandra.input.split.size_in_mb 128 spark.cassandra.input.fetch.size_in_rows 10000 spark.sql.qubole.split.computation true spark.sql.inmemorycolumnarstorage.compressed true

0 个答案:

没有答案