我正试图从猪身上连接到Cassandra。 但Cassandra安装在不同的集群中,我需要连接以便从pig远程连接到Cassandra。
我指的是以下链接exmaple
获取错误
Failed to parse: Can not retrieve schema from loader org.apache.cassandra.hadoop.pig.CqlStorage@1216d9bf
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:198)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1688)
at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1421)
at org.apache.pig.PigServer.parseAndBuild(PigServer.java:354)
at org.apache.pig.PigServer.executeBatch(PigServer.java:379)
at org.apache.pig.PigServer.executeBatch(PigServer.java:365)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:769)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:484)
at org.apache.pig.Main.main(Main.java:158)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
我的猪脚本如下
A = LOAD 'cql://userName:password/mykeyspace/mycolumnfamily'
USING org.apache.cassandra.hadoop.pig.CqlStorage()
AS (user_id:long, fname:chararray, last_update_date:chararray, lname:chararray);
DUMP A;
请告诉我们必须提供安装Cassandra的系统的IP地址
答案 0 :(得分:0)
我在互联网上搜索的内容是http://www.datastax.com/dev/blog/cassandra-and-pig-tutorial
使用Pig查询Cassandra
通过Datastax Enterprise启动pig客户端。
除了在Google Analytics模式下启动群集之外,无需进行任何设置。
(14:52:17)[~/BlogPosts/CassPig_Libraries]dse pig
2013-08-26 14:52:27,166 [main] INFO org.apache.pig.Main - Logging error messages to: /Users/russellspitzer/BlogPosts/CassPig_Libraries/pig_1377553947163.log
2013-08-26 14:52:27,421 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: cfs://127.0.0.1/
2013-08-26 14:52:27.488 java[64588:1503] Unable to load realm info from SCDynamicStore
2013-08-26 14:52:28,348 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: 127.0.0.1:8012
grunt>
Next we construct our pig commands, starting with loading our data from Cassandra. We’ll be using the cql:// url and the CqlStorage() connector. The format of the command is basically load ‘cql://keyspace/table’. More info on CQL3 and Pig.
grunt> libdata = load 'cql://libdata/libout' USING CqlStorage();
grunt> DESCRIBE libdata;
将以下内容设置为环境变量(大写, 强调),或作为Hadoop配置变量(小写,点缀):
* PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
* PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on
* PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner
例如,对于具有默认设置的本地节点,您将使用:
export PIG_INITIAL_ADDRESS=localhost
export PIG_RPC_PORT=9160
export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner
如果您为输入和输出使用不同的群集,则可以使用以下内容覆盖这些属性:
* PIG_INPUT_INITIAL_ADDRESS : initial address to connect to for reading
* PIG_INPUT_RPC_PORT : the port thrift is listening on for reading
* PIG_INPUT_PARTITIONER : cluster partitioner for reading
* PIG_OUTPUT_INITIAL_ADDRESS : initial address to connect to for writing
* PIG_OUTPUT_RPC_PORT : the port thrift is listening on for writing
* PIG_OUTPUT_PARTITIONER : cluster partitioner for writing
有关更多参考,请参阅以下网址
https://github.com/Stratio/stratio-cassandra/tree/master/examples/pig
希望这有助于!!! ...