Question

我正在尝试通过比较函数返回数据框和从csv文件读取的数据框来使用assert_frame_equal进行测试。从此函数返回的数据帧创建了csv文件：

results = my_fun()
results.to_csv("test.csv", mode="w", sep="\t", index=False)

因此，我认为它们应该相同。现在，在测试中，我有以下代码。

results = my_fun()
test_df = pd.read_csv("test.csv", sep="\t", header="infer", index_col=False, encoding="utf-8")
assert_frame_equal(results.reset_index(drop=True), test_df.reset_index(drop=True), check_column_type=False, check_dtype=False)

我得到的是以下异常：

E   AssertionError: DataFrame.iloc[:, 0] (column name="document_id") are different
E
E   DataFrame.iloc[:, 0] (column name="document_id") values are different (100.0 %)
E   [left]:  [1, 1, 1, 2, 2, 2, 2, 2]
E   [right]: [1, 1, 1, 2, 2, 2, 2, 2]

我挠头。这里的实际区别是什么？如果我打印results["document_id"]和test_df["document_id"]，则会得到：

0    1
1    1
2    1
3    2
4    2
5    2
6    2
7    2
Name: document_id, dtype: object <class 'pandas.core.series.Series'>
0    1
1    1
2    1
3    2
4    2
5    2
6    2
7    2
Name: document_id, dtype: int64 <class 'pandas.core.series.Series'>

Answer 1

如果您以其他方式进行比较会发生什么？例如，

cv2

更新：问题2：发生了什么事

results['document_id'] == test_df['document_id']

熊猫assert_frame_equal无法比较两个相同的数据帧

1 个答案: