批量导入CSV到Cassandra 2.0.3

时间:2014-01-15 08:29:32

标签: csv cassandra bulkloader cassandra-2.0 bulk-import

我想将CSV批量上传到cassandra 2.0.3。 现在我已成功将CSV转换为sstables。

但是,当我运行sstableloader时,会出现如下错误消息。这个错误是否会影响我的bulkload,因为我没有在cassandra 2.0.3中找到导入的数据?

VirtualBox:~/apache-cassandra-2.0.3$ ./bin/sstableloader -d localhost airlines/flight/
ERROR 16:08:04,832 Unable to initialize MemoryMeter (jamm not specified as javaagent).  This means Cassandra will be unable to measure object sizes accurately and may consequently OOM.
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of airlines/flight/airlines-flight-jb-1-Data.db to [/127.0.0.1, /127.0.0.2]
progress: [/127.0.0.2 1/1 (100%)] [/127.0.0.1 1/1 (100%)] [total: 100% - 0MB/s (avg: 0MB/s)]

1 个答案:

答案 0 :(得分:1)

我将sstableloader作业包装在一个bash脚本中,最初有完全相同的错误。 我做了一些挖掘,发现设置JAVA_TOOL_OPTIONS环境变量修复了我的问题。

这是我的剧本:

#!/bin/bash

# ------------------------
# paths to the cassandra source tree, cassandra jar and java
CASSANDRA_HOME="/usr/share/cassandra"
JAVA_AGENT="-javaagent:$CASSANDRA_HOME/lib/jamm-0.2.5.jar"
export JAVA_TOOL_OPTIONS=$JAVA_AGENT
# ------------------------

# ------------------------
# Initialize Parameters
SSTLOADER=`which sstableloader`
SSDATADIR=/usr/share/cassandra/scripts/sstable_load/data/<schema_name>/<column family>

CASSNODE="192.168.2.1"

# ------------------------
log_dir=/usr/share/cassandra/scripts/sstable_load/logs
dt=`date +'%Y%m%d_%H%M%S'`
logdest=$log_dir/sstabloader_"$dt".log
# ------------------------

exec 1>$logdest
echo "Job Started: " `date`
echo "Job Logged To: " $logdest
echo

# ------------------------
# Run the SSTableLoader Command
$SSTLOADER -v -d $CASSNODE -u <user> -pw <password> $SSDATADIR


echo
echo "Job Completed: " `date`

exit 0

替换&lt;&gt;中的脚本条目你有适当的信息。

希望这适合你。

请投票。