带有difflib的Python Asyncio减慢了爬行速度

时间:2016-11-11 08:21:38

标签: python python-3.x python-3.5 python-asyncio difflib

我有一个异步下载多个网址的脚本,然后通过difflib持续监控它们的变化

public class ReportVM extends BaseObservable {

public String name;
public String contractor;

public ReportVM() {
}

@Bindable
public String getName() {
    return name;
}

public void setName(final String name) {
    this.name = name;
}

@Bindable
public String getContractor() {
    return contractor;
}

public void setContractor(final String contractor) {
    this.contractor = contractor;
}
}

当我使用以下行评论时运行它

import asyncio
import difflib
import aiohttp

urls = ['http://www.nytimes.com/',
        'http://www.time.com/',
        'http://www.economist.com/']

async def get_url(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            old = await resp.text()
            print('Initial -',url)
        while True:
            async with session.get(url) as resp1:
                new = await resp.text()
            print('Got -',url)
            diff = difflib.unified_diff(old, new)

            for line in diff:
                print(line)
            old = new

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    ops = []
    for url in urls:
        ops.append(get_url(url))
    loop.run_until_complete(asyncio.wait(ops))

脚本按预期运行,每秒检索每个URL大约3次。

当取消注释这些行时,脚本会慢下来,比连续运行检索要慢得多。

我不知道为什么会这样,是否与difflib返回生成器有关?

1 个答案:

答案 0 :(得分:0)

首先,您的代码中存在错误,而不是new = await resp.text()它应该是new = await resp1.text()

unified_diff使用字符串列表而不是直接使用字符串。您可以使用splitlines()将字符串快速拆分为行:

diff = difflib.unified_diff(old.splitlines(), new.splitlines())

(目前长字符串中的每个字符都被视为一行!)