我正在收集一些我写入mysql数据库的数据帧。现在有时旧的和新的df有重复,我想退出。例如:
老df:
timestamp volume price
id
211007692 1520969598625 0.181410 9044.9
211007688 1520969598364 0.100000 9045.0
211007687 1520969598340 0.050110 9045.0
211007673 1520969598122 0.005090 9046.1
211007667 1520969597783 0.083778 9046.1
211007666 1520969597782 0.010000 9046.1
211007665 1520969597781 0.010000 9046.1
211007664 1520969597780 0.010415 9046.1
211007663 1520969597779 0.012977 9046.1
new df
timestamp volume price
id
211007709 1520969599391 0.061845 9043.6
211007708 1520969599370 0.181066 9043.6
211007705 1520969599222 0.132000 9043.5
211007700 1520969599006 1.000000 9044.5
211007694 1520969598710 0.100000 9043.5
211007692 1520969598625 0.181410 9044.9
211007688 1520969598364 0.100000 9045.0
211007687 1520969598340 0.050110 9045.0
有没有一种优雅的方法来整理所有重复项?
答案 0 :(得分:0)
duplicated
s=pd.concat([old,new],keys=['old','new'])
s[s.reset_index(level=1).duplicated(keep=False).values]
Out[492]:
timestamp volume price
id
old 211007692 1520969598625 0.18141 9044.9
211007688 1520969598364 0.10000 9045.0
211007687 1520969598340 0.05011 9045.0
new 211007692 1520969598625 0.18141 9044.9
211007688 1520969598364 0.10000 9045.0
211007687 1520969598340 0.05011 9045.0