我正在使用java在hadoop中开发一个项目。当我在本地集群上运行我的代码(jar)时它工作正常但是当我在亚马逊多集群上运行时它会给出异常......
我的mapreduce工作代码....
job.setJarByClass(ReadActivityDriver.class);
job.setMapperClass(ReadActivityLogMapper.class);
job.setReducerClass(ReadActivityLogReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setInputFormatClass(ColumnFamilyInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
ConfigHelper.setInputRpcPort(job.getConfiguration(), pro.getProperty("port"));
ConfigHelper.setInputInitialAddress(job.getConfiguration(), pro.getProperty("server"));
ConfigHelper.setInputPartitioner(job.getConfiguration(), "org.apache.cassandra.dht.Murmur3Partitioner");
ConfigHelper.setInputColumnFamily(job.getConfiguration(), keyspace, columnFamily);
SlicePredicate predicate = new SlicePredicate().setColumn_names(cn);
ConfigHelper.setInputSlicePredicate(job.getConfiguration(), predicate);
FileSystem.get(job.getConfiguration()).delete(new Path("ReadOutput"), true);
FileOutputFormat.setOutputPath(job, new Path("ReadOutput"));
job.waitForCompletion(true);
我得到的例外......
8020/home/ubuntu/hdfstmp/mapred/staging/ubuntu/.staging/job_201405080944_0010
java.lang.RuntimeException: org.apache.thrift.TApplicationException: Invalid method name: 'describe_local_ring'
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getRangeMap(AbstractColumnFamilyInputFormat.java:337)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getSplits(AbstractColumnFamilyInputFormat.java:125)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at com.cassandra.readActivity.ReadActivityDriver.run(ReadActivityDriver.java:117)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at com.cassandra.readActivity.ReadActivityDriver.main(ReadActivityDriver.java:33)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: org.apache.thrift.TApplicationException: Invalid method name: 'describe_local_ring'
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at org.apache.cassandra.thrift.Cassandra$Client.recv_describe_local_ring(Cassandra.java:1277)
at org.apache.cassandra.thrift.Cassandra$Client.describe_local_ring(Cassandra.java:1264)
at org.apache.cassandra.hadoop.AbstractColumnFamilyInputFormat.getRangeMap(AbstractColumnFamilyInputFormat.java:329)
... 20 more
java.io.FileNotFoundException: File does not exist: /user/ubuntu/ReadOutput/part-r-00000;
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchLocatedBlocks(DFSClient.java:2006)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1975)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1967)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:735)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:165)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:436)
at com.cassandra.readActivity.ReadActivityMySql.calculatePoint(ReadActivityMySql.java:65)
at com.cassandra.readActivity.ReadActivityDriver.main(ReadActivityDriver.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
java.io.FileNotFoundException: File does not exist: /user/ubuntu/ReadOutput/part-r-00000;
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchLocatedBlocks(DFSClient.java:2006)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1975)
at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1967)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:735)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:165)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:436)
at com.cassandra.readActivity.MySqlSavePoint.setSavePoint(MySqlSavePoint.java:66)
at com.cassandra.readActivity.ReadActivityDriver.main(ReadActivityDriver.java:37)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
答案 0 :(得分:3)
它看起来像输入/输出格式jar并且您的群集没有使用相同的Cassandra版本。您需要修复jar或升级AWS Cassandra节点。
答案 1 :(得分:1)
我认为你的cassandra分区中的问题是尝试随机分区
ConfigHelper.setInputPartitioner(job.getConfiguration(),"org.apache.cassandra.dht.RandomPartitioner");
答案 2 :(得分:0)
最后我得到了答案..
使用随机分区
ConfigHelper.setInputPartitioner(job.getConfiguration(),"org.apache.cassandra.dht.RandomPartitioner");
而不是murmummer
ConfigHelper.setInputPartitioner(job.getConfiguration(),"org.apache.cassandra.dht.Murmur3Partitioner");