迭代两个数据库数据集,同时在python中进行比较

时间:2016-05-16 00:26:11

标签: database python-2.7 loops compare iteration

下面提到的代码用于比较相同或不同数据库中的两个表。我无法检索所需的结果以获得不匹配的记录。

我的问题: 我无法在table1&中打印唯一不匹配的记录。 table2因为我发现逐行迭代很困难。目前,即使匹配的记录也会打印为不匹配的记录。

import psycopg2
conn_string = "host='localhost' dbname='dvdrental' user='postgres' password='jai'"
db1 = psycopg2.connect(conn_string)
db2= psycopg2.connect(conn_string)
cursor1=db1.cursor()
cursor2=db2.cursor()
cursor1.execute("select * from public.actor order by 1")
results1 = cursor1.fetchall()
cursor2.execute("select * from public.actor order by 1")
results2 = cursor2.fetchall()
count1 =  len(results1)
count2 =  len(results2)
# print count1
# print count2
# print results1
# print results2
# print results1[0]
# print results2[0]
for i in range(0,count1):
    for j in range(0,count2):
        if (results1[i] == results2[j]):
            print "found"
        else:
            print "not found",results1

3 个答案:

答案 0 :(得分:0)

在两个表之间查找匹配值的适当方法是在SQL中使用join。但是,我认为这在Python中比SQL更多。

您的主要问题是您有嵌套循环。请考虑一下您的循环有效执行的操作:

list_a = ['a','b','c']
list_b = ['a','b','c']

for a_item in list_a:
   for b_item in list_b:
      print a_item,b_item

输出:

   a  a
   a  b
   a  c
   b  a
   b  b
   b  c
   c  a
   c  b
   c  c

如果您希望每个可迭代对象的长度相同且元素位于相同位置,您可以执行更类似的操作:

for index in range(0,len(list_a)):
   if list_a[index] == list_b[index]:
      print 'Found'
      continue
   print 'Not Found'

您可以/应该对代码进行一些其他更改,但它们与您提出的问题不直接相关。如果你想要反馈,请告诉我。

使用SQL

免责声明:您尚未与我分享每个表格中的列,但这应该(希望)为您提供以下概念:

import psycopg2
import psycopg2.extras

conn = psycopg2.connect("host='localhost' dbname='dvdrental' user='postgres' password='jai'")

sql = """
SELECT
   a.Col1 AS Col1_a,
   b.Col1 AS Col1_b,
   CASE
      WHEN a.Col1 IS NOT NULL AND b.Col1 IS NOT NULL THEN 'Match'
      ELSE 'No Match'
   END AS Result
FROM
   public.actor_1 a
FULL OUTER JOIN
   public.actor_2 b
   ON (b.Col1 = a.Col1);
"""

cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
cur.execute(sql)
results = cur.fetchall()
cur.close()
conn.close()

for row in results:
   print row['Col1_a'], row['Col1_b'], row['result']

答案 1 :(得分:0)

我尝试了类似下面的代码。请告诉我您的反馈

import psycopg2
conn_string = "host='localhost' dbname='dvdrental' user='postgres' password='jai'"
db1 = psycopg2.connect(conn_string)
db2= psycopg2.connect(conn_string)
cursor1=db1.cursor()
cursor2=db2.cursor()
cursor1.execute("select * from public.actor except select * from public.actor_1")
results1 = cursor1.fetchall()
cursor2.execute("select * from public.actor_1 except select * from public.actor")
results2 = cursor2.fetchall()
count1 =  len(results1)
count2 =  len(results2)
# print count1
# print count2
# print results1
# print results2
print results1
print results2

答案 2 :(得分:0)

我尝试了以下带有扩展名的代码。请让我知道您的反馈意见

import psycopg2
import numpy as np
conn_string = "host='localhost' dbname='dvdrental' user='postgres' password='jai'"
db1 = psycopg2.connect(conn_string)
db2= psycopg2.connect(conn_string)
cursor1=db1.cursor()
cursor2=db2.cursor()
cursor1.execute("select * from public.actor except select * from public.actor_1")
results1 = cursor1.fetchall()
cursor2.execute("select * from public.actor_1 except select * from public.actor")
results2 = cursor2.fetchall()
#count1 =  len(results1)
#count2 =  len(results2)
df1['results1']=df2['results2']  

df1 ['not_match_matched'] = np.where(df1 ['results1'] == df2 ['results2'],'True','False') final = np.where(df1)[1] 打印(最终)

提供您的反馈意见