Question

我有以下代码：

def scrapeFacebookPageFeedStatus(access_token):

    query = "SELECT page_id FROM falken"

    result_list = c.execute(query)

    for single_row in result_list:

        str_single_row = str(single_row)

        str_norm_single_row = str_normalize(str_single_row)

        print(str_norm_single_row)

当我执行上面的代码时，它会显示 result_list 中的每个 single_row 值。

但是当我将 single_row 传递给下面的函数时：

def scrapeFacebookPageFeedStatus(access_token):

    query = "SELECT page_id FROM falken"

    result_list = c.execute(query)

    for single_row in result_list:

        str_single_row = str(single_row)

        str_norm_single_row = str_normalize(str_single_row)

        print(str_norm_single_row)

        statuses = getFacebookPageFeedData(str_norm_single_row, access_token, 100)

        for status in statuses['data']:

            query = "INSERT INTO falken_posts VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"

            c.execute(query,(processFacebookPageFeedStatus(status, access_token)))

            conn.commit()

它只将 single_row 的第一个值传递给函数，循环停止。

getFacebookPageFeedData 功能

def getFacebookPageFeedData(page_id, access_token, num_statuses):

base = "https://graph.facebook.com/v2.6"
node = "/%s/posts" % page_id 
fields = "/?fields=message,link,created_time,type,name,id," + \
        "comments.limit(0).summary(true),shares,reactions" + \
        ".limit(0).summary(true)"
parameters = "&limit=%s&access_token=%s" % (num_statuses, access_token)
url = base + node + fields + parameters

# retrieve data
data = json.loads(request_until_succeed(url))

return data

它从Facebook Graph API中检索来自页面的帖子的数据。

processFacebookPageFeedStatus 功能

def processFacebookPageFeedStatus(status, access_token):

    status_id = status['id']
    status_message = '' if 'message' not in status.keys() else \
        unicode_normalize(status['message'])
    link_name = '' if 'name' not in status.keys() else \
        unicode_normalize(status['name'])
    status_type = status['type']
    status_link = '' if 'link' not in status.keys() else \
        unicode_normalize(status['link'])

    status_published = datetime.datetime.strptime(
        status['created_time'],'%Y-%m-%dT%H:%M:%S+0000')
    status_published = status_published + \
        datetime.timedelta(hours=-5) # EST
    status_published = status_published.strftime(
        '%Y-%m-%d %H:%M:%S')

    num_reactions = 0 if 'reactions' not in status else \
        status['reactions']['summary']['total_count']
    num_comments = 0 if 'comments' not in status else \
        status['comments']['summary']['total_count']
    num_shares = 0 if 'shares' not in status else status['shares']['count']

    reactions = getReactionsForStatus(status_id, access_token) if \
        status_published > '2016-02-24 00:00:00' else {}

    num_likes = 0 if 'like' not in reactions else \
        reactions['like']['summary']['total_count']

    num_likes = num_reactions if status_published < '2016-02-24 00:00:00' \
        else num_likes

它存储来自 status 字典的所需数据，并将其存储到变量中以便插入数据库。

Answer 1

sqlite＆＃39; s cursor.execute()返回光标本身。所以在这一行之后：

result_list = c.execute(query)

result_list实际上是c的别名。

现在开始迭代c：

for single_row in result_list:
    # code here

然后再次致电c.execute()：

    query = "INSERT INTO falken_posts VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"
    c.execute(query,(processFacebookPageFeedStatus(status, access_token)))

使用此新查询的结果丢弃c之前的结果集。由于此查询没有选择任何内容，c变为空迭代器，并且循环停在那里。

治愈很简单：使用另一个光标进行插入查询，这样就不会覆盖c的结果集：

# create a second cursor for insert statements
writer = conn.cursor()
# no need to recreate this same string anew for each iteration, 
# we can as well define it here once for all
insert_query = "INSERT INTO falken_posts VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"

# no need for result_list - just iterate over `c` 
c.execute(query)
for single_row in c:
    # code here
    writer.execute(insert_query,(processFacebookPageFeedStatus(status, access_token)))

作为旁注，如果性能是一个问题，您可能还希望在整个循环之后而不是在每个insert语句之后提交一次。

执行函数时循环中断

1 个答案: