I have a question regarding UUID generation.
Typically, when I'm generating a UUID I will use a random or time based generation method.
HOWEVER, I'm migrating legacy data from MySQL over to a C* datastore and I need to change the legacy (auto-incrementing) integer IDs to UUIDS. Instead of creating another denormalized table with the legacy integer IDs as the primary key and all the data duplicated, I was wondering what folks thought about padding 0's onto the front of the integer ID to form a UUID. Example below.
*Something important to note is that the legacy IDs highest values will never top 1 million, so overflow isn't really an issue.
The idea would look like this:
Legacy ID: 123456 ---> UUID: 00000000-0000-0000-0000-000000123456
This would be done using some string concats and the UUID.fromString("00000000-0000-0000-0000-000000123456" method.
Does this seem like a bad pattern to anyone? I'm not a huge fan of the idea, gives me a bad taste in my mouth, but I don't have a technical reason for why haha.
As far as collisions go, the probability of a collision occurring is still ridiculously low. So I'm not worried about increasing collisions. I suppose it just seems like bad practice to me, that its "too easy".
答案 0 :(得分:2)
We faced the same kind of issue before when migrating from Oracle with ids generated by sequence to Cassandra with generated UUIDs.
We had to design a type to both support old data coming from Oracle with type I have tried your example you need to put @Json creator annotation on your Items class constructor as well.
Below is the modified code.
class Item {
private final String id;
private final String type;
private final String desc;
@JsonCreator
public Item( @JsonProperty("id")String id, @JsonProperty("type")String type, @JsonProperty("desc")String desc) {
this.id = id;
this.type = type;
this.desc = desc;
}
}
class MyObject {
private final Map<String,Item> items;
@JsonCreator
public MyObject(@JsonProperty("items") Map<String, Item> items) {
this.items = items;
}
and new data with long
.
The obvious solution is to use type uuid
to store the id. A blob can encode a blob
or an long
.
This solution only works for partition key because you query them using uuid
. It won't work for clustering column using operators like =
or >
because we need an ordering on their value.
There was a small objection at that time, which was using a <
to store the id makes it opaque to user, for example in cqlsh when you're doing a SELECT and you need to provide the id, how would you make a blob ?
Fortunately, the native functions of CQL blob
, bigIntAsBlob()
, blobAsBigInt()
and uuidAsBlob()
come in very handy.
答案 1 :(得分:0)
我决定从doanduyhai的答案走另一个方向。
为了保持数据的一致性,我们决定对数据进行完全去规范化,并在C *中创建另一个以我们的旧ID为中心的表。将对象从遗留系统迁移到C *时,会为它们分配一个新的随机生成的UUID,这将是它们未来的新主要ID。遗留ID将保持不变,直到我们决定不再需要它们为止。在那个时候,我们可以干净地删除遗留ID表并完成它们。
此解决方案允许我们在未来从旧版ID系统中获得更清晰的中断,并允许我们阻止使用奇怪的自定义UUID。我也不是将ID字段作为blob类型的忠实粉丝,可以在其中存储多种类型的数据,因为在未来,我们计划只希望UUID存在。