我有两个Dataframe如下:
Dataframe 1
timestamp_read base
1508025600009 A
1508025600088 G
1508025600156 C
1508025600200 T
1508025600257 T
1508025600307 C
1508025600403 G
1508025600476 G
1508025600550 D
1508025600596 G
1508025600606 D
1508025600658 G
Dataframe 2
timestamp_read base
1508025600009 A
1508025600101 G
1508025600104 C
1508025600174 T
1508025600233 T
1508025600233 T Additional T
1508025600238 C
1508025600266 G
1508025600268 G Missing D
1508025600285 G
1508025600393 D
1508025600455 G
1508025600460 A Additional A
timestamp_read是一个纪元时间。数据帧1和2应该是相同的,但它们不是在两个独立的机器上运行的诊断,因此存在一定程度的延迟。有时可能会在一台机器上而不是另一台机器上错过结果,反之亦然。合并这两个数据帧的最佳方法是什么,考虑到这种延迟差异。我怀疑解决方案可能涉及大规模并行签名排序,但我很想听听解决方案。
期望的输出:
timestamp_read base
1508025600009 A
1508025600101 G
1508025600104 C
1508025600174 T
1508025600233 T
1508025600233 T Additional T
1508025600238 C
1508025600266 G
1508025600268 G Missing D
1508025600272 D Synthetically generated timestamp based on
distance from other points in original timeseries
is optional.
1508025600285 G
1508025600393 D
1508025600455 G
1508025600460 A Additional A