我在VM上设置了单个Cassandra节点。我必须创建一个70000列的表。为此,我编写了读取json文件和创建表的java代码。 这是我的java代码片段。 当我运行我的java代码时,它会在创建一些列后抛出异常。 异常堆栈是
public void createTable(String keyspaceName, String tableName) throws FileNotFoundException{
JSONParser jsonParser = new JSONParser();
FileReader fileReader;
String filePath = "";
String columnHeader = "";
//String completeColumnHeader = "";
try{
System.out.println("Inside Create Table");
session.executeAsync("DROP TABLE IF EXISTS "+keyspaceName+"."+tableName+";");
String createQuery = "CREATE TABLE "+keyspaceName+"."+tableName +"(\"P:LanguageID\" text, "
+ "\"P:PdmarticleID\" text, PRIMARY KEY(\"P:PdmarticleID\",\"P:LanguageID\"));";
session.execute(createQuery);
System.out.println("Table created");
filePath = "CassandraTableColumnHeader/FixColumnHeader.json";
fileReader = new FileReader(filePath);
JSONObject jsonObject = (JSONObject) jsonParser.parse(fileReader);
JSONArray jsonArray = (JSONArray) jsonObject.get("columnHeaderName");
int columnHeaderSize = jsonArray.size();
int columnHeaderBatchSize = 1000;
int fromIndex = 0;
int toIndex = columnHeaderBatchSize;
while(columnHeaderSize > 0){
columnHeaderSize -=columnHeaderBatchSize;
for(int i = fromIndex; i < toIndex; i++) {
columnHeader = (String) jsonArray.get(i);
if(columnHeader.equals("P:PdmarticleID")||columnHeader.equals("P:LanguageID")){
continue;
}
session.execute("ALTER TABLE "+keyspaceName+"."+tableName +" ADD "+"\""+columnHeader+"\""+" text;");
}
fromIndex = toIndex;
if(columnHeaderSize < columnHeaderBatchSize){
toIndex += columnHeaderSize;
}else{
toIndex = toIndex + columnHeaderBatchSize;
}
}
}catch(FileNotFoundException fnfe){
throw fnfe;
}catch (ParseException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
线程中的异常&#34; main&#34; com.datastax.driver.core.exceptions.NoHostAvailableException:尝试查询的所有主机都失败了(尝试:/127.0.0.1:9042(com.datastax.driver.core.exceptions.DriverException:主机回复服务器错误:java .lang.RuntimeException:java.util.concurrent.ExecutionException:java.lang.RuntimeException:java.io.FileNotFoundException:C:\ apache-cassandra-new \ data \ data \ system \ schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697 \ system-schema_columnfamilies-tmplink- ka-4839-Data.db(进程无法访问该文件,因为它正由另一个进程使用))) at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:84) at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:265) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:179) 在com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:52) 在com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:36) 在com.exportstagging.SparkTest.DataLoaderInCassandra.createTable(DataLoaderInCassandra.java:89) 在com.exportstagging.SparkTest.DataLoaderInCassandra.main(DataLoaderInCassandra.java:216) 引起:com.datastax.driver.core.exceptions.NoHostAvailableException:所有尝试查询的主机都失败了(尝试:/127.0.0.1:9042(com.datastax.driver.core.exceptions.DriverException:主机回复服务器)错误:java.lang.RuntimeException:java.util.concurrent.ExecutionException:java.lang.RuntimeException:java.io.FileNotFoundException:C:\ apache-cassandra-new \ data \ data \ system \ schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697 \ system-schema_columnfamilies -tmplink-ka-4839-Data.db(进程无法访问该文件,因为它正由另一个进程使用))) 在com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:216) 在com.datastax.driver.core.RequestHandler.access $ 900(RequestHandler.java:45) at com.datastax.driver.core.RequestHandler $ SpeculativeExecution.sendRequest(RequestHandler.java:276) at com.datastax.driver.core.RequestHandler $ SpeculativeExecution $ 1.run(RequestHandler.java:374) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor $ Worker.run(Unknown Source) 在java.lang.Thread.run(未知来源)
我被困在这里。请帮我。提前谢谢。
答案 0 :(得分:0)
If I were you I might reevaluate creating a table with 70k column headers. Your partition key P:PdmarticleID and full primary key (P:PdmarticleID, P:LanguageID) are the only two pieces of information you will be able to use to get results anyway. So having these other pieces of information explicitly stored in columns is not buying you anything.
A collection (eg. map) can hold onto 64k items, with certain other limitations (see http://wiki.apache.org/cassandra/CassandraLimitations). Is there a way you can split the columns such that you can create multiple tables, with some pieces of information stored in one table and some in another?