我在我的Mac上为Cassandra,Apache Spark和Flume设置了一个演示环境(Mac OS X Yosemite with Oracle jdk1.7.0_55)。该环境应作为新分析平台的概念证明,因此我需要在我的cassandra数据库中使用一些测试数据。我正在使用cassandra 2.0.8。
我在excel中创建了一些演示数据并将其导出为CSV文件。结构是这样的:
ProcessUUID;ProcessID;ProcessNumber;ProcessName;ProcessStartTime;ProcessStartTimeUUID;ProcessEndTime;ProcessEndTimeUUID;ProcessStatus;Orderer;VorgangsNummer;VehicleID;FIN;Reference;ReferenceType
0F0D1498-D149-4FCC-87C9-F12783FDF769;AbmeldungKl‰rfall;1;Abmeldung Kl‰rfall;2011-02-03 04:05+0000;;2011-02-17 04:05+0000;;Finished;SIXT;4278;A-XA 1;WAU2345CX67890876;KLA-BR4278;internal
然后我使用:
在cqlsh中创建了一个键空间和一个列族CREATE KEYSPACE dadcargate
WITH REPLICATAION = { 'class' : 'SimpleStrategy', 'replication_factor' : '1' };
use dadcargate;
CREATE COLUMNFAMILY Process (
ProcessUUID uuid, ProcessID varchar, ProcessNumber bigint, ProcessName varchar,
ProcessStartTime timestamp, ProcessStartTimeUUID timeuuid, ProcessEndTime timestamp,
ProcessEndTimeUUID timeuuid, ProcessStatus varchar, Orderer varchar,
VorgangsNummer varchar, VehicleID varchar, FIN varchar, Reference varchar,
ReferenceType varchar,
PRIMARY KEY (ProcessUUID))
WITH COMMENT='A process is like a bracket around multiple process steps';
列系列名称及其中的所有列都是使用小写字母创建的 - 有一天也必须对此进行调查,但目前不太相关。
现在我接受我的CSV文件,该文件有大约1600个条目,并且想要在我的名为process
的表中导入它,如下所示:
cqlsh:dadcargate> COPY process (processuuid, processid, processnumber, processname,
processstarttime, processendtime, processstatus, orderer, vorgangsnummer, vehicleid,
fin, reference, referencetype)
FROM 'Process_BulkData.csv' WITH DELIMITER = ';' AND HEADER = TRUE;
它出现以下错误:
Record #0 (line 1) has the wrong number of fields (15 instead of 13).
0 rows imported in 0.050 seconds.
基本上是这样,因为我的cvs-export中没有timeUUID字段。
如果我尝试没有像这样的显式列名的COPY命令(事实上,我确实错过了两个字段):
cqlsh:dadcargate> COPY process from 'Process_BulkData.csv'
WITH DELIMITER = ';' AND HEADER = TRUE;
我最后还有另一个错误:
Bad Request: Input length = 1
Aborting import at record #0 (line 1). Previously-inserted values still present.
0 rows imported in 0.009 seconds.
嗯。有点奇怪,但没关系。也许COPY命令不喜欢缺少两个字段的事实。我仍然认为这很奇怪,因为缺少的字段当然是(从结构的角度来看)但只是空的。
我还有另一个镜头:我删除了excel中缺少的列,再次将文件导出为cvs并尝试在我的csv BUT显式列名中导入WITHOUT标题行,如下所示:
cqlsh:dadcargate> COPY process (processuuid, processid, processnumber, processname,
processstarttime, processendtime, processstatus, orderer, vorgangsnummer, vehicleid,
fin, reference, referencetype)
FROM 'Process_BulkData-2.csv' WITH DELIMITER = ';' AND HEADER = TRUE;
我收到此错误:
Bad Request: Input length = 1
Aborting import at record #0 (line 1). Previously-inserted values still present.
0 rows imported in 0.034 seconds.
任何人都可以告诉我我在这里做错了什么吗?根据{{3}},我设置命令的方式应该适用于其中至少两个。或者我认为。
但不,我显然遗漏了一些重要的事情。
答案 0 :(得分:15)
cqlsh的COPY
命令可能很敏感。但是,COPY
documentation就是这一行:
CSV输入中的列数与Cassandra表元数据中的列数相同。
记住这一点,我确实设法通过命名空字段(COPY FROM
和processstarttimeuuid
分别使用processendtimeuuid
导入数据:
aploetz@cqlsh:stackoverflow> COPY process (processuuid, processid, processnumber,
processname, processstarttime, processstarttimeuuid, processendtime,
processendtimeuuid, processstatus, orderer, vorgangsnummer, vehicleid, fin, reference,
referencetype) FROM 'Process_BulkData.csv' WITH DELIMITER = ';' AND HEADER = TRUE;
1 rows imported in 0.018 seconds.
aploetz@cqlsh:stackoverflow> SELECT * FROM process ;
processuuid | fin | orderer | processendtime | processendtimeuuid | processid | processname | processnumber | processstarttime | processstarttimeuuid | processstatus | reference | referencetype | vehicleid | vorgangsnummer
--------------------------------------+-------------------+---------+---------------------------+--------------------+-------------------+--------------------+---------------+---------------------------+----------------------+---------------+------------+---------------+-----------+----------------
0f0d1498-d149-4fcc-87c9-f12783fdf769 | WAU2345CX67890876 | SIXT | 2011-02-16 22:05:00+-0600 | null | AbmeldungKl‰rfall | Abmeldung Kl‰rfall | 1 | 2011-02-02 22:05:00+-0600 | null | Finished | KLA-BR4278 | internal | A-XA 1 | 4278
(1 rows)
答案 1 :(得分:0)
将csv文件加载到cassandra表中
step1)使用此url安装cassandra loader sudo wget https://github.com/brianmhess/cassandra-loader/releases/download/v0.0.23/cassandra-loader
step2)sudo chmod + x cassandra-loader
a)csv文件名是" pt_bms_tkt_success_record_details_new_2016_12_082017-01-0312-30-01.csv"
b)键空间名称为" bms_test"
c)表名是" pt_bms_tkt_success_record_details_new"
d)列是" trx_id ...... trx_day"
step3)csv文件位置和cassandra-loader是" cassandra3.7 / bin /"
step $)[stp @ ril-srv-sp3 bin] $ ./cassandra-loader -f pt_bms_tkt_success_record_details_new_2016_12_082017-01-0312-30-01.csv -host 192.168.1.29 -schema" bms_test.pt_bms_tkt_success_record_details_new( trx_id,码max_seq,trx_type,trx_record_type,trx_date,trx_show_date,cinema_str_id,SESSION_ID,ttype_code,ITEM_ID,item_var_sequence,trx_booking_id,venue_name,screen_by_tnum,price_group_code,area_cat_str_code,area_by_tnum,venue_capacity,amount_currentprice,venue_class,trx_booking_status_committed,booking_status,amount_paymentstatus,event_application,venue_cinema_companyname, venue_cinema_name,venue_cinema_type,venue_cinema_application,region_str_code,venue_city_name,sub_region_str_code,sub_region_str_name,EVENT_CODE,event_type不同,EVENT_NAME,event_language,event_genre,event_censor_rating,event_release_date,event_producer_code,event_item_name,event_itemvariable_name,event_quantity,amount_amount,amount_bookingfee,amount_deliveryfee,amount_additionalcharges,amount_final,amount_tax,offer_isapplied,奥菲R_TYPE,OFFER_NAME,offer_amount,payment_lastmode,payment_lastamount,payment_reference1,payment_reference2,payment_bank,customer_loginid,customer_loginstring,offer_referral,customer_mailid,customer_mobile,trans_str_sales_status_at_venue,trans_mny_trans_value_at_venue,payment_ismypayment,click_recordsource,广告活动,源,关键词,介质,venue_multiplex,venue_state,mobile_type,transaction_range, life_cyclestate_from,transactions_after_offer,is_premium_transaction,city_type,holiday_season,week_type,event_popularity,transactionrange_after_discount,showminusbooking,input_source_name,信道,TIME_STAMP,life_cyclestate_to,record_status,week_name,number_of_active_customers,event_genre1,event_genre2,event_genre3,event_genre4,event_language1,event_language2,event_language3,event_language4,event_release_date_range, showminusbooking_range,reserve1,reserve2,reserve3,reserve4,reserve5,payment_mode,payment_type,date_of_first_transaction,transaction_time_in_hours,showtime_in_hours,trx_day)&#34 ;;