我有2个df(称为bdf和cdf)进行比较以验证其内容是否相等。所以我用了
pd.util.testing.assert_frame_equal(bdf, cdf, check_dtype=False, check_like=True, check_exact=True)
进行比较。但是,该函数声明了我不期望的列中的差异:
DataFrame.iloc[:, 70] are different
DataFrame.iloc[:, 70] values are different (100.0 %)
[left]: [201801300040150000014217, 201801300040150000014217, 201801300040150000013737, 201801290040150000019605, 201801300040150000076982, 201801300040150000136588, 201801300040150000242399, 201801300040150000293800, 201801300040150000293801, 201801290040150000128792, 201801300040150000367067, 201801300040150000367770, 201801300040150000369255, 201801260040150000097789, 0, 0, 201801290040150000145140, 0, 201801290040150000145184, 201801290040150000145190, 201801290040150000145198, 201801290040150000145206, 201801290040150000145214, 201801290040150000145222, 0, 0, 201801290040150000145245, 201801290040150000145254, 201801290040150000145263, 201801290040150000145271, 201801290040150000145278, 201801290040150000145286, 201801290040150000145297, 201801290040150000145309, 201801290040150000145318, 201801290040150000145327, 201801290040150000149263, 201801290040150000149264, 201801300040150000433569, 201801290040150000156348, 201801290040150000161046, 201801290040150000161050, 201801290040150000165445, 0, 201801290040150000165456, 201801290040150000165472, 0, 0, 201801290040150000165496, 0, 0, 201801290040150000165520, 0, 0, 0, 201801290040150000165556, 0, 201801260040150000129418]
[right]: [201801300040150000014217, 201801300040150000014217, 201801300040150000013737, 201801290040150000019605, 201801300040150000076982, 201801300040150000136588, 201801300040150000242399, 201801300040150000293800, 201801300040150000293801, 201801290040150000128792, 201801300040150000367067, 201801300040150000367770, 201801300040150000369255, 201801260040150000097789, 0, 0, 201801290040150000145140, 0, 201801290040150000145184, 201801290040150000145190, 201801290040150000145198, 201801290040150000145206, 201801290040150000145214, 201801290040150000145222, 0, 0, 201801290040150000145245, 201801290040150000145254, 201801290040150000145263, 201801290040150000145271, 201801290040150000145278, 201801290040150000145286, 201801290040150000145297, 201801290040150000145309, 201801290040150000145318, 201801290040150000145327, 201801290040150000149263, 201801290040150000149264, 201801300040150000433569, 201801290040150000156348, 201801290040150000161046, 201801290040150000161050, 201801290040150000165445, 0, 201801290040150000165456, 201801290040150000165472, 0, 0, 201801290040150000165496, 0, 0, 201801290040150000165520, 0, 0, 0, 201801290040150000165556, 0, 201801260040150000129418]
从视觉上看,它们看起来并没有什么不同。当我打印出值并输入dtype:
print "bdf: {}, type {}".format(bdf['refid'][0], bdf['refid'].dtype)
print "cdf: {}, type {}".format(cdf['refid'][0], cdf['refid'].dtype)
我明白了:
bdf: 201801300040150000014217, type object
cdf: 201801300040150000014217, type object
那么为什么assert_frame_equal()会说它们的值和dtypes相同时它们是不同的?作为观察,这两个表中有200多列,所有这些列都是dtype = object,但我没有得到这些列的任何比较错误。