Pandas DataFrame相等 - 索引编号

时间:2015-10-19 20:41:53

标签: python python-2.7 pandas

索引编号在测试数据帧相等性方面是否重要?我有2个相同的数据帧,具有完全相同的数据和列。唯一的区别是每行的索引号不同,equals方法返回False。我怎么能绕过这个?这是我的数据框

   A   B
0  87  54
1  87  75
2  87  22
3  87  69

     A   B
418  87  69
107  87  54
108  87  75
250  87  22

1 个答案:

答案 0 :(得分:1)

您可以使用New-Object : Retrieving the COM class factory for component with CLSID {0E59F1D5-1FBE-11D0-8FF2-00A0D10038BC} failed due to the following error: 80040154. At D:\Location\Remove__company_from_CSV.ps1:11 char:22 + $vbs = New-Object <<<< -ComObject 'MSScriptControl.ScriptControl' + CategoryInfo : ResourceUnavailable: (:) [New-Object], COMException + FullyQualifiedErrorId : NoCOMClassIdentified,Microsoft.PowerShell.Commands.NewObjectCommand Property 'Language' cannot be found on this object; make sure it exists and is settable. At D:\Location\Remove__company_from_CSV.ps1:12 char:10 + $vbs. <<<< Language = "VBScript" + CategoryInfo : InvalidOperation: (Language:String) [], RuntimeException + FullyQualifiedErrorId : PropertyNotFound You cannot call a method on a null-valued expression. At D:\Location\Remove__company_from_CSV.ps1:13 char:17 + $vbs.AddCode <<<< ($vbsCode) + CategoryInfo : InvalidOperation: (AddCode:String) [], RuntimeException + FullyQualifiedErrorId : InvokeMethodOnNull 检查值,但排序很重要,因此在您的示例中,您必须先按索引排序。

np.array_equal

注意:您无法比较df1和df2,因为它们具有不同的索引:

In [11]: df1
Out[11]:
    A   B
0  87  54
1  87  75
2  87  22
3  87  69

In [12]: df2
Out[12]:
      A   B
418  87  69
107  87  54
108  87  75
250  87  22

In [13]: df3 = df2.sort()

In [14]: df3
Out[14]:
      A   B
107  87  54
108  87  75
250  87  22
418  87  69

In [15]: np.array_equal(df1, df3)
Out[15]: True

您可以重置索引,但请注意,出于这个原因可能会引发异常:

In [21]: df1 == df2
ValueError: Can only compare identically-labeled DataFrame object

另一个选择是在In [22]: df3.reset_index(drop=True) Out[22]: A B 0 87 54 1 87 75 2 87 22 3 87 69 In [23]: np.all(df1 == df3.reset_index(drop=True)) Out[23]: True 周围设置一个try和except块:

assert_frame_equals

,如related answer

正如杰夫所指出的,你可以使用.equals,它可以做到这一点:

In [24]: pd.util.testing.assert_frame_equal(df1, df3.reset_index(drop=True))