Question

对于来自oracle表的大型数据集，我想使用python执行比较。任何人都可以在python中（使用cx_oracle模块）指导优化（更快）方法

我尝试将两个sqls输出存储在两个不同的数据帧中，并循环每个单元格进行比较。但是要花很多时间才能完成。

for row in range(dfrow):
    for col in range(dfcol):
        value_old = sorted_t.iloc[row, col]
        value_new = sorted_p.iloc[row, col]
        if value_old != value_new:
            dfdiff.iloc[row, col] = ({} -> {}).format(value_old, value_new)

我希望能有更快的比较方法。

Answer 1

您使用类似的查询填充T和P数据帧

select id, a, b, c
from p;

听起来T和P中有很多行，通常情况下，它们会匹配。您想快速跳过匹配的行。好吧，甚至不要将它们拖到内存中，让oracle过滤掉匹配项：

select t.id, t.a, t.b, t.c,
       p.id, p.a, p.b, p.c
from   t
join   p on t.id = p.id
where  t.a != p.a
       or t.b != p.b
       or t.c != t.c;

使用任何您喜欢的格式来显示这些过滤后的值。

优化使用python在两个表之间比较oracle数据的方法

1 个答案: