Question

成像我们在RDBMS，INVOICE和INVOICE_LINE_ITEMS中有2个表，并且INVOICE和INVOICE_LINE_ITEMS之间存在一对多的关系。

INVOICE（1）--------＆gt; （*）INVOICE_LINE_ITEMS

上面说的实体现在需要存储在 Cassandra 中，为此我们可以遵循2种方法，

带有PRIMARY KEY（invoice_id，invoice_line_item_id）的非规范化表，对于一张发票，会有多个line_item_ids。



INVOICE的一行，其中包含SET＆lt; FROZEN＆lt; INVOICE_LINE_ITEMS_UDT＆gt;＆gt;



有2个表并负责更新2个表并在DAO代码中加入查询结果

用例是，

用户可以创建发票并继续添加，更新和删除行
用户可以使用invoice或invoice_line_udt属性进行搜索并获取发票详细信息（使用DSE Search solr_query）
INVOICE（标题）可能包含20个属性，每个项目（invoice_line）可能包含大约30多个属性的大型UDT，每个集合可能包含~1000行。

问题：

由于序列化和反序列化，使用冻结集合会影响读写性能。考虑到UDT包含30多个字段，最多包含1000个项目，这是一个很好的方法还是数据模型？
因为有序列化和反序列化，所以每次更新记录或分区时都会替换UDT的集合。 列更新会创建墓碑吗？考虑到我们在项目（UDT集合）中有很多更新会产生问题吗？

以下是方法1的CQL :(具有UDT集合的发票标题行）

CREATE TYPE IF NOT EXISTS comment_udt (
    created_on timestamp,
  user text,
  comment_type text,
  comment text
);

CREATE TYPE IF NOT EXISTS invoice_line_udt ( ---TO REPRESENT EACH ITEM ---
invoice_line_id text,
invoice_line_number int,
parent_id text,

item_id text,
item_name text,
item_type text,

uplift_start_end_indicator text,
uplift_start_date timestamp,
uplift_end_date timestamp,
bol_number text,
ap_only text,

uom_code text,
gross_net_indicator text,
gross_quantity decimal,
net_quantity decimal,
unit_cost decimal,
extended_cost decimal,

available_quantity decimal,
total_cost_adjustment decimal,
total_quantity_adjustment decimal,
total_variance decimal,

alt_quantity decimal,
alt_quantity_uom_code text,
adj_density decimal,

location_id text,
location_name text,
origin_location_id text,
origin_location_name text,
intermediate_location_id text,
intermediate_location_name text,
dest_location_id text,
dest_location_name text,

aircraft_tail_number text,
flight_number text,
aircraft_type text,

carrier_id text,
carrier_name text,

created_on timestamp,
created_by text,
updated_on timestamp,
updated_by text,
status text,

matched_tier_name text,
matched_on text,
workflow_action text,

adj_reason text,
credit_reason text,
hold_reason text,
delete_reason text,
ap_only_reason text
);


CREATE TABLE IF NOT EXISTS invoice_by_id ( -- MAIN TABLE --
invoice_id text,
parent_id text,
segment text,
invoice_number text,
invoice_type text,
source text,
ap_only text,
invoice_date timestamp,
received_date timestamp,
due_date timestamp,

vendor_id text,
vendor_name text,
vendor_site_id text,
vendor_site_name text,

currency_code text,
local_currency_code text,
exchange_rate decimal,
exchange_rate_date timestamp,
extended_cost decimal,
early_pay_discount decimal,
payment_method text,
invoice_amount decimal,
total_tolerance decimal,
total_variance decimal,

location_id text,
location_name text,
dest_location_override text,

company_id text,
company_name text,
org_id text,

sold_to_number text,
ship_to_number text,
ref_po_number text,
sanction_indicator text,

created_on timestamp,
created_by text,
updated_on timestamp,
updated_by text,
manually_assigned text,
assigned_user text,
assigned_group text,
workflow_process_id text,
version int,
comments set<frozen<comment_udt>>, 
status text,

lines set<frozen<invoice_line_udt>>,-- COLLECTION OF UDTs --

PRIMARY KEY (invoice_id, invoice_type));

以下是方法2的脚本:(非规范化发票和一个分区但多行的行）

CREATE TABLE wfs_eam_ap_matching.invoice_and_lines_copy1 ( 
invoice_id uuid, 
invoice_line_id uuid, 
record_type text, 
active boolean, 
adj_density decimal, 
adj_reason text, 
aircraft_tail_number text, 
aircraft_type text, 
alt_quantity decimal, 
alt_quantity_uom_code text, 
ap_only boolean, 
ap_only_reason text, 
assignment_group text, 
available_quantity decimal, 
bol_number text, 
cancel_reason text, 
carrier_id uuid, 
carrier_name text, 
comments LIST<FROZEN<comment_udt>>, 
company_id uuid, 
company_name text, 
created_by text, 
created_on timestamp, 
credit_reason text, 
dest_location_id uuid, 
dest_location_name text, 
dest_location_override boolean, 
dom_intl_indicator text, 
due_date timestamp, 
early_pay_discount decimal, 
exchange_rate decimal, 
exchange_rate_date timestamp, 
extended_cost decimal, 
flight_number text, 
fob_point text, 
gross_net_indicator text, 
gross_quantity decimal, 
hold_reason text, 
intermediate_location_id uuid, 
intermediate_location_name text, 
invoice_currency_code text, 
invoice_date timestamp, 
invoice_line_number int, 
invoice_number text, 
invoice_type text, 
item_id uuid, 
item_name text, 
item_type text, 
local_currency_code text, 
location_id uuid, 
location_name text, 
manually_assigned boolean, 
matched_on timestamp, 
matched_pos text, 
matched_tier_name text, 
net_quantity decimal, 
org_id int, 
origin_location_id uuid, 
origin_location_name text, 
parent_id uuid, 
payment_method text, 
received_date timestamp, 
ref_po_number text, 
sanction_indicator text, 
segment text, 
ship_to_number text, 
sold_to_number text, 
solr_query text, 
source text, 
status text, 
total_tolerance decimal, 
total_variance decimal, 
unique_identifier FROZEN<TUPLE<text, text>>, 
unit_cost decimal, 
uom_code text, 
updated_by text, 
updated_on timestamp, 
uplift_end_date timestamp, 
uplift_start_date timestamp, 
uplift_start_end_indicator text, 
user_assignee text, 
vendor_id uuid, 
vendor_name text, 
vendor_site_id uuid, 
vendor_site_name text, 
version int, 
workflow_process_id text, 
PRIMARY KEY (invoice_id, invoice_line_id, record_type) 
);

注意：我们使用datastax cassandra + DSE Search。它不支持静态列，因此我们不使用它。另外，为了给出真实的图片我已经列出了表格和UDT，其中列有很多列，最后创建了一个很长的问题。

在Cassandra

0 个答案: