Question

作为序言，我想制作一个Twitter抓取程序，它可以比tweetdeck更快，并且比流式API可以提供给我的更快，可以在新的tweet上为我更新。当我向要监控的新推文的页面发出请求时，在按新方式发布新推文时，程序不会更改其输出。当前，我的代码应向https://twitter.com/username发出多个异步请求，并且它返回前两个鸣叫（包括固定的鸣叫）。如何调整请求，以便在程序运行时用新的推文更新页面？

我仍在尝试了解aiohttp库，因此我无法进行很多调整并没有取得成功。

import requests
import re
import time
import aiohttp
import asyncio
from bs4 import BeautifulSoup as bs

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()


# takes a url, and takes number of tweets to print
async def get_recent(username, n):
    base_link = 'https://twitter.com'
    url = base_link + '/' + username
    async with aiohttp.ClientSession() as session:
        data_text = await fetch(session, url)
    # data = requests.get(url)
    recent_tweets = []
    html = bs(data_text, 'html.parser')
    # get timeline
    timeline = html.select('#timeline li.stream-item')

    # DEBUG makes a file to see the exact html we're working with, but
    # formatted nicely. Uncomment the next two lines to do so.

    # with open('html.html', 'w', encoding='utf-8') as f_out:
        # f_out.write(html.prettify())

    for tweet in timeline[:n]:
        PARSE STUFF [deleted for clarity]

        # output to a list of dictionaries

        recent_tweets.append({"id": tweet_id, "text": tweet_text, "link_to_tweet": tweet_link, "links": in_tweet_links, "link_to_pic": pic_link})


    print(recent_tweets)

然后在我的主要功能中

loop = asyncio.get_event_loop()
    all_groups = asyncio.gather(*[get_recent('username', 2) for _ in range(20)])
    results = loop.run_until_complete(all_groups)

据我了解，这应该发出20个请求，并给我相应时间轴的前2条推文。如果我在程序运行时提交了一条Tweet，则在程序停止并再次运行之前，输出不会反映新的Tweet。

发布新的tweet时，使用aiohttp向Twitter发送异步请求不会更新？

0 个答案: