运行COPY命令的Pickling Error:Windows上的CQLShell

时间:2015-06-03 17:55:01

标签: python csv cassandra cassandra-2.0 cqlsh

我们正在Windows 7上的CQLShell中运行一个复制命令。起初,我们遇到了“不正确的复制命令”:

COPY ourdata(data_time, data_ID, dataBlob)
FROM 'TestData.csv'
WITH HEADER = true;

我们后来在运行相同的命令后开始收到此错误:

Error starting import process:

Can't pickle <type 'thread.lock'>: it's not found as thread.lock
can only join a started process
cqlsh:testkeyspace> Traceback (most recent call last):
               File "<string>", line 1, in <module>
               File "C:\Program Files\DataStax\Community\python\lib\multiprocessing\forking.py",
                      line 373, in main
               prepare(preparation_date)
               File "C:\Program Files\DataStax Community\python\lib\multiprocessing\forking.py",
                      line 482, in prepare
                      file, path_name, etc = imp.find_module(main_name, dirs)
ImportError: No module named cqlsh

我们不确定路径是否存在问题(没有名为cqlsh的模块),或者使用csv文件的python pickling对象。

2 个答案:

答案 0 :(得分:3)

所以我去测试了这个。我在Windows和Linux上在Cassandra 2.1.5( BTW-你使用哪个版本?)中创建了两个简单的表。然后我测试了每个的COPY TO / FROM。

Linux(Ubuntu 14.04.2 LTS):

Connected to Test Cluster at dockingbay94:9042.
[cqlsh 5.0.1 | Cassandra 2.1.5 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
aploetz@cqlsh> use stackoverflow2;
aploetz@cqlsh:stackoverflow2> COPY dummy3(id,time) TO '/home/aploetz/dummy3.txt' 
    WITH HEADER=true AND DELIMITER='|';

4 rows exported in 0.071 seconds.
aploetz@cqlsh:stackoverflow2> COPY dummy4(id,time) FROM '/home/aploetz/dummy3.txt' 
    WITH HEADER=true AND DELIMITER='|';

4 rows imported in 0.427 seconds.

Windows 8.1:

Connected to Window$ Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 2.1.5 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
WARNING: pyreadline dependency missing.  Install to enable tab completion.
aploetz@cqlsh> use stackoverflow;
aploetz@cqlsh:stackoverflow> COPY dummy3(id,time) TO 'e:\dummy3.txt' 
    WITH HEADER=true AND DELIMITER='|';

4 rows exported in 0.020 seconds.
aploetz@cqlsh:stackoverflow> COPY dummy4(id,time) FROM 'e:\dummy3.txt' 
    WITH HEADER=true AND DELIMITER='|';

Error starting import process:

Can't pickle <type 'thread.lock'>: it's not found as thread.lock
can only join a started process
aploetz@cqlsh:stackoverflow> Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "E:\Program Files\DataStax Community\python\lib\multiprocessing\forking.py", line 373, in main
    prepare(preparation_data)
  File "E:\Program Files\DataStax Community\python\lib\multiprocessing\forking.py", line 482, in prepare
    file, path_name, etc = imp.find_module(main_name, dirs)
ImportError: No module named cqlsh

因此COPY TO(导出)工作正常,但是在Windows上COPY FROM(导入)失败。

DataStax的Josh McKenzie在去年12月发表了一篇名为Cassandra and Windows: Past, Present, and Future的帖子。在其中,他讨论了Cassandra在Windows上存在的一些长期问题。本质上,Windows NTFS阻止其他进程更改/删除由不同进程使用(锁定)的文件。这些问题直接影响CQLSH将数据复制到Cassandra的能力。

有一个JIRA票证(CASSANDRA-9670)解决了类似的问题(在Windows上使用CQLSH运行cql脚本,产生相同的错误消息)。我强烈怀疑这两个问题是相关的。无论如何,Cassandra预计将在Windows 3.0上得到支持,目前正处于开发阶段。&#34;正在开发中。我尝试了一些技巧,看看我是否可以在Windows上找到解决方法,如果找到的话,我会报告回来。但就目前而言,您可能只需要在Linux上使用Cassandra就可以从其全部功能中受益。

答案 1 :(得分:1)

当我使用Cassandra 2.1时,我遇到了同样的问题。当我将Cassandra更新为2.2时,错误消失了。尝试更新您的Cassandra。