与Cassandra一起使用信号量来强制执行executeAsync写操作的JAVA代码,以消除NoHostAvailableException错误

时间:2016-02-16 19:17:10

标签: java cassandra semaphore datastax-java-driver

我有一些基本代码在for循环中使用预准备语句,并使用信号量将结果写入Cassandra表并进行一些限制。

  Session session = null;
  try {
    session = connector.openSession();
  } catch( Exception ex ) {
    //  .. moan and complain..
    System.err.printf("Got %s trying to openSession - %s\n", ex.getClass().getCanonicalName(), ex.getMessage() );
  }
  if( session != null ) {

// Prepared Statement for Cassandra Inserts
        PreparedStatement statement = session.prepare(
                                "INSERT INTO model.base " +
                                "(channel, " +
                                "time_key, " +
                                "power" +
                                ") VALUES (?,?,?);");
        BoundStatement boundStatement = new BoundStatement(statement); 


//Query Cassandra Table that has capital letters in the column names        
        ResultSet results = session.execute("SELECT \"Time_Key\",\"Power\",\"Bandwidth\",\"Start_Frequency\" FROM \"SB1000_49552019\".\"Measured_Value\" limit 800000;");

 // Get the Variables from each Row of Cassandra Data        
       for (Row row : results){
           // Upper Case Column Names in Cassandra
           time_key = row.getLong("Time_Key");
           start_frequency = row.getDouble("Start_Frequency");
           power = row.getFloat("Power");
           bandwidth = row.getDouble("Bandwidth");


// Create Channel Power Buckets, place information into prepared statement binding, write to cassandra.
                for(channel = 1.6000E8; channel <= channel_end; channel+=increment ){       
                    if( (channel >= start_frequency) && (channel <= (start_frequency + bandwidth)) ) {

                  ResultSetFuture rsf =  session.executeAsync(boundStatement.bind(channel,time_key,power));  
                       backlogList.add( rsf );   // put the new one at the end of the list
                       if( backlogList.size() > 10000 ) {      // wait till we have a few

                           while( backlogList.size() > 5432 ) {      // then harvest about half of the oldest ones of them

                               rsf = backlogList.remove(0);

                               rsf.getUninterruptibly();

                           }    // end while

                       }  // end if

                    }  // end if

                }  // end for

  } // end "row" for

 } // end session

我的连接使用以下内容构建:

public static void main(String[] args) {
if (args.length != 2) {
    System.err.println("Syntax: com.neutronis.Spark_Reports <Spark Master URL> <Cassandra contact point>");
    System.exit(1);
}

SparkConf conf = new SparkConf();
conf.setAppName("Spark Reports");
conf.setMaster(args[0]);
conf.set("spark.cassandra.connection.host", args[1]);

Spark_Reports app = new Spark_Reports(conf);

app.run();
}

使用此代码尝试使用信号量,但我的Cassandra Cluster似乎仍然过载并将错误解决:

  

ERROR ControlConnection:[控制连接]无法连接到任何   主机,在1000毫秒内调度重试线程中的异常&#34; main&#34;   com.datastax.driver.core.exceptions.NoHostAvailableException:全部   尝试查询失败的主机(未尝试主机)

它似乎很奇怪它没有尝试过主机。

我已查看其他信号量限制问题,例如thisthis,并尝试应用于上面的代码,但仍然收到错误。

1 个答案:

答案 0 :(得分:2)

请阅读我对此问题的回答,了解如何在使用异步调用时进行反压:What is the best way to get backpressure for Cassandra Writes?