Question

我知道它可以用传统的方式完成，但是如果我使用Cassandra DB，是否有一种简单/快速和敏捷的方法将csv作为一组键值对添加到数据库中？

能够通过CSV文件添加时间序列数据是我的首要要求。我可以切换到任何其他数据库，如mongodb，rike，如果它在那里很方便可行..

Answer 1

编辑2 2017年12月2日
请使用端口9042.Cassandra访问已更改为CQL，默认端口为9042,9160是Thrift的默认端口。

编辑1
没有任何编码，有一种更好的方法可以做到这一点。看看这个答案https://stackoverflow.com/a/18110080/298455

但是，如果您想要预处理或自定义的东西，您可能需要自己进行预处理。这是一个冗长的方法：

创建列族。

cqlsh> create keyspace mykeyspace 
with strategy_class = 'SimpleStrategy' 
and strategy_options:replication_factor = 1;

cqlsh> use mykeyspace;

cqlsh:mykeyspace> create table stackoverflow_question 
(id text primary key, name text, class text);

假设您的CSV是这样的：

$ cat data.csv 
id,name,class
1,hello,10
2,world,20

编写一个简单的Python代码来读取文件并转储到您的CF.像这样：

import csv 
from pycassa.pool import ConnectionPool
from pycassa.columnfamily import ColumnFamily

pool = ConnectionPool('mykeyspace', ['localhost:9160'])
cf = ColumnFamily(pool, "stackoverflow_question")

with open('data.csv', 'rb') as csvfile:
  reader = csv.DictReader(csvfile)
  for row in reader:
    print str(row)
    key = row['id']
    del row['id']
    cf.insert(key, row)

pool.dispose()

执行此操作：

$ python loadcsv.py 
{'class': '10', 'id': '1', 'name': 'hello'}
{'class': '20', 'id': '2', 'name': 'world'}

查看数据：

cqlsh:mykeyspace> select * from stackoverflow_question;
 id | class | name
----+-------+-------
  2 |    20 | world
  1 |    10 | hello

另见：

一个。谨防DictReader
湾看Pycassa
C。 Google为Cassandra提供现有的CSV加载程序。我想有。
d。使用CQL驱动程序可能有一种更简单的方法，我不知道即使用适当的数据类型。我把它们全部包装成文字。不好。

HTH

我没有看到时间序列要求。以下是你如何处理时间序列。

这是您的数据

$ cat data.csv
id,1383799600,1383799601,1383799605,1383799621,1383799714
1,sensor-on,sensor-ready,flow-out,flow-interrupt,sensor-killAll

创建传统的宽行。（CQL建议不要使用COMPACT STORAGE，但这只是为了让你快速前进。）

cqlsh:mykeyspace> create table timeseries 
(id text, timestamp text, data text, primary key (id, timestamp)) 
with compact storage;

修改后的代码：

import csv
from pycassa.pool import ConnectionPool
from pycassa.columnfamily import ColumnFamily

pool = ConnectionPool('mykeyspace', ['localhost:9160'])
cf = ColumnFamily(pool, "timeseries")

with open('data.csv', 'rb') as csvfile:
  reader = csv.DictReader(csvfile)
  for row in reader:
    print str(row)
    key = row['id']
    del row['id']
    for (timestamp, data) in row.iteritems():
      cf.insert(key, {timestamp: data})

pool.dispose()

这是你的时间序列

cqlsh:mykeyspace> select * from timeseries;
 id | timestamp  | data
----+------------+----------------
  1 | 1383799600 |      sensor-on
  1 | 1383799601 |   sensor-ready
  1 | 1383799605 |       flow-out
  1 | 1383799621 | flow-interrupt
  1 | 1383799714 | sensor-killAll

Answer 2

假设您的CSV看起来像

'P38-Lightning', 'Lockheed', 1937, '.7'

cqlsh到您的数据库

和..

CREATE TABLE airplanes (
 name text PRIMARY KEY,
 manufacturer ascii,
 year int,
 mach float
);

...然后

COPY airplanes (name, manufacturer, year, mach) FROM '/classpath/temp.csv';

参考：http://www.datastax.com/docs/1.1/references/cql/COPY

Answer 3

备份

./cqlsh -e"copy <keyspace>.<table> to '../data/table.csv';"

使用备份

./cqlsh -e"copy <keyspace>.<table> from '../data/table.csv';"

如何将csv添加到cassandra db？

3 个答案: