Question

我正在使用带有python的postgreSQL，而SQL数据库是定期添加行的。目前，python程序不知道是否添加了新数据（我使用psycopg2来读取行。但它会读取直到行结束并停止）。如果添加了新数据，我如何让我的python程序不断搜索？或者我可以在添加新行时让postgreSQL触发python吗？

这就是我目前的情况：

def get_data():
    try:
        connect = psycopg2.connect(database="yardqueue", user="postgres", password="abcd", host="localhost", port="5432")
    except:
        print "Could not open database"
    cur = connect.cursor()
    cur.execute("SELECT id,position FROM container")
    rows = cur.fetchall()
    for row in rows:
        print "ID = ", row[0]
        print "Position = ", row[1]

正如你所看到的，当我运行它时，它会在变量＆＃39; row＆＃39;到达最后一行。

编辑：有没有办法可以让我的python代码在指定的时间内运行？如果是这样，我可以让它通过数据库直到我杀了它。

Answer 1

如果你想查看我们可以写的新记录（假设container表中没有删除）：

from time import sleep

import psycopg2

IDLE_INTERVAL_IN_SECONDS = 2


def get_data():
    try:
        connect = psycopg2.connect(database="yardqueue", user="postgres",
                                   password="abcd", host="localhost",
                                   port="5432")
    except:
        print "Could not open database"
        # TODO: maybe we should raise new exception? 
        # or leave default exception?
        return
    cur = connect.cursor()
    previous_rows_count = 0
    while True:
        cur.execute("SELECT id, position FROM container")
        rows_count = cur.rowcount
        if rows_count > previous_rows_count:
            rows = cur.fetchall()
            for row in rows:
                print "ID = ", row[0]
                print "Position = ", row[1]
            previous_rows_count = rows_count
        sleep(IDLE_INTERVAL_IN_SECONDS)

如果我们只想处理新记录，我们可以添加ordering by id和offset

from time import sleep

import psycopg2

IDLE_INTERVAL_IN_SECONDS = 2


def get_data():
    try:
        connect = psycopg2.connect(database="yardqueue", user="postgres",
                                   password="abcd", host="localhost",
                                   port="5432")
    except:
        # TODO: maybe we should raise new exception? 
        # or leave default exception?
        print "Could not open database"
        return
    cur = connect.cursor()
    rows_count = 0
    while True:
        cur.execute("SELECT id, position FROM container "
                    # sorting records by id to get new records data
                    # assuming that "id" column values are increasing for new records
                    "ORDER BY id "
                    # skipping records that we have already processed
                    "OFFSET {offset}"
                    .format(offset=rows_count))
        rows_count = cur.rowcount
        if rows_count > 0:
            rows = cur.fetchall()
            for row in rows:
                print "ID = ", row[0]
                print "Position = ", row[1]
        sleep(IDLE_INTERVAL_IN_SECONDS)

Answer 2

不幸的是，数据库没有插入顺序的概念，因此您作为设计者必须提供明确的顺序。如果不这样做，您获取的行的顺序（使用新光标）可能随时更改。

此处可能的方法是在表格中添加serial字段。 PostgreSQL通过序列实现一个串行字段，这保证每个新插入的行的序列号都大于当前存在的所有序列号。但是：

如果交易需要序列号并且中止，则可能存在漏洞
如果多个并发事务插入串行字段，则串行字段的顺序将是 insert 命令的顺序，而不是 commit 命令的顺序。这意味着竞争条件可能导致错误的订单。但是，如果数据库中只有一个作者，那就没关系了。

另一种方法是使用插入日期字段 - 插入应用程序必须明确地管理它，或者您可以使用触发器以交替方式设置它。 PostgreSQL时间戳具有微秒精度。这意味着如果同时插入多行，则它们可以具有相同的插入日期值。您的Python脚本应该在打开游标之前读取时间并获取插入时间大于其上次运行时间的所有行。但在这里，你应该关心竞争条件......

适用于数据库的实时python应用程序

2 个答案: