Question

我正试图从猪身上连接到Cassandra。但Cassandra安装在不同的集群中，我需要连接以便从pig远程连接到Cassandra。

我指的是以下链接exmaple

获取错误

Failed to parse: Can not retrieve schema from loader org.apache.cassandra.hadoop.pig.CqlStorage@1216d9bf
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:198)
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1688)
    at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1421)
    at org.apache.pig.PigServer.parseAndBuild(PigServer.java:354)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:379)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:365)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
    at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:769)
    at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:372)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
    at org.apache.pig.Main.run(Main.java:484)
    at org.apache.pig.Main.main(Main.java:158)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

我的猪脚本如下

A = LOAD 'cql://userName:password/mykeyspace/mycolumnfamily' USING org.apache.cassandra.hadoop.pig.CqlStorage() AS (user_id:long, fname:chararray, last_update_date:chararray, lname:chararray); DUMP A;

请告诉我们必须提供安装Cassandra的系统的IP地址

Answer 1

我在互联网上搜索的内容是http://www.datastax.com/dev/blog/cassandra-and-pig-tutorial

使用Pig查询Cassandra

通过Datastax Enterprise启动pig客户端。

除了在Google Analytics模式下启动群集之外，无需进行任何设置。

 (14:52:17)[~/BlogPosts/CassPig_Libraries]dse pig
 2013-08-26 14:52:27,166 [main] INFO org.apache.pig.Main - Logging error messages to: /Users/russellspitzer/BlogPosts/CassPig_Libraries/pig_1377553947163.log
 2013-08-26 14:52:27,421 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: cfs://127.0.0.1/
 2013-08-26 14:52:27.488 java[64588:1503] Unable to load realm info from SCDynamicStore
 2013-08-26 14:52:28,348 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: 127.0.0.1:8012
grunt>

Next we construct our pig commands, starting with loading our data from Cassandra. We’ll be using the cql:// url and the CqlStorage() connector. The format of the command is basically load ‘cql://keyspace/table’. More info on CQL3 and Pig.


grunt> libdata = load 'cql://libdata/libout' USING CqlStorage(); 
grunt> DESCRIBE libdata;

将以下内容设置为环境变量（大写，强调），或作为Hadoop配置变量（小写，点缀）：

 * PIG_INITIAL_ADDRESS or cassandra.thrift.address : initial address to connect to
 * PIG_RPC_PORT or cassandra.thrift.port : the port thrift is listening on
 * PIG_PARTITIONER or cassandra.partitioner.class : cluster partitioner

例如，对于具有默认设置的本地节点，您将使用：

 export PIG_INITIAL_ADDRESS=localhost
 export PIG_RPC_PORT=9160
 export PIG_PARTITIONER=org.apache.cassandra.dht.Murmur3Partitioner

如果您为输入和输出使用不同的群集，则可以使用以下内容覆盖这些属性：

 * PIG_INPUT_INITIAL_ADDRESS : initial address to connect to for reading
 * PIG_INPUT_RPC_PORT : the port thrift is listening on for reading
 * PIG_INPUT_PARTITIONER : cluster partitioner for reading
 * PIG_OUTPUT_INITIAL_ADDRESS : initial address to connect to for writing
 * PIG_OUTPUT_RPC_PORT : the port thrift is listening on for writing
 * PIG_OUTPUT_PARTITIONER : cluster partitioner for writing

有关更多参考，请参阅以下网址

https://github.com/Stratio/stratio-cassandra/tree/master/examples/pig

希望这有助于!!! ...

从猪

1 个答案: