使用asyncio和aiofiles异步生成平面文件无法正常工作

时间:2018-09-16 15:08:32

标签: python-asyncio aiofiles

我正在学习python中的异步编程,并且知道asyncio是我们应该使用的最新软件包,因此我尝试编写一个简单的脚本来异步生成一些文件(我使用线程并行生成文件,并且它可以完美地工作)。由于写入IO大部分时间都是在生成文件时花费的,因此这是我的脚本。

版本

(py37) C:\Users\Hong\Desktop>
(py37) C:\Users\Hong\Desktop>pip freeze
aiofiles==0.4.0
asn1crypto==0.24.0
certifi==2018.8.24
cffi==1.11.5
chardet==3.0.4
cryptography==2.3.1
idna==2.7
pycparser==2.18
pyOpenSSL==18.0.0
PySocks==1.6.8
requests==2.19.1
six==1.11.0
urllib3==1.23
win-inet-pton==1.0.1
wincertstore==0.2

(py37) C:\Users\Hong\Desktop>python --version
Python 3.7.0

(py37) C:\Users\Hong\Desktop>

异步方式

import os
import asyncio
import aiofiles
import time
import datetime
import urllib

async def produce_content(c):
    return c*1000

async def create_file(file_name):
    tmp_file = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'tmp.txt')
    try:
        async with aiofiles.open(tmp_file, mode='r') as rf:
            print('--> read start', datetime.datetime.now(), file_name)
            content = await rf.read()
            print('--> read end', datetime.datetime.now(), file_name)
            content = await produce_content(content)
            async with aiofiles.open(file_name, mode='w') as wf:
                print('--> write start', datetime.datetime.now(), file_name)
                await wf.write(content)
                await wf.flush()
                print('--> write end', datetime.datetime.now(), file_name)
    except Exception as e:
        print(e)
        raise e
    #return file_name

id = 0
async def my_action(file_name):
    global id
    id += 1
    local_id = id
    print('start to run %s'%local_id, datetime.datetime.now())
    await create_file(file_name)
    print('end to run %s'%local_id, datetime.datetime.now())

def run():
    files = [
        os.path.join(os.path.dirname(os.path.abspath(__file__)), 'f%s.txt'%i) for i in range(0,3)
    ]

    start_ts = datetime.datetime.now()
    print('start', start_ts)

    loop = asyncio.get_event_loop()
    tasks = [asyncio.ensure_future(my_action(f)) for f in files]
    try:
        loop.run_until_complete(asyncio.wait(tasks))
    finally:
        loop.close()

    end_ts = datetime.datetime.now()
    print('end', end_ts)
    print('time elapse', end_ts-start_ts)


if __name__=='__main__':
    run()

在我的示例中,tmp.txt是一个大小为240K的文件,我以此为基础并创建了比其大1000倍的目标文件。为了比较异步方式和同步方式之间的时间成本,这里是同步方式,将以下内容替换为create_file主体(仅使用常规方法而不是aiofiles)

async def create_file(file_name):
    tmp_file = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'tmp.txt')
    try:
        with open(tmp_file, mode='r') as rf:
            print('--> read start', datetime.datetime.now(), file_name)
            content = rf.read()
            print('--> read end', datetime.datetime.now(), file_name)
            content = produce_content(content)
            with open(file_name, mode='w') as wf:
                print('--> write start', datetime.datetime.now(), file_name)
                wf.write(content)
                print('--> write end', datetime.datetime.now(), file_name)
    except Exception as e:
        print(e)
        raise e

有结果

(py37) C:\Users\Hong\Desktop>python non_async.py
start 2018-09-16 22:33:12.929901
start to run 1 2018-09-16 22:33:12.929901
--> read start 2018-09-16 22:33:12.929901 C:\Users\Hong\Desktop\f0.txt
--> read end 2018-09-16 22:33:12.945520 C:\Users\Hong\Desktop\f0.txt
--> write start 2018-09-16 22:33:13.531200 C:\Users\Hong\Desktop\f0.txt
--> write end 2018-09-16 22:33:19.701563 C:\Users\Hong\Desktop\f0.txt
end to run 1 2018-09-16 22:33:19.831177
start to run 2 2018-09-16 22:33:19.831177
--> read start 2018-09-16 22:33:19.831177 C:\Users\Hong\Desktop\f1.txt
--> read end 2018-09-16 22:33:19.846803 C:\Users\Hong\Desktop\f1.txt
--> write start 2018-09-16 22:33:20.483649 C:\Users\Hong\Desktop\f1.txt
--> write end 2018-09-16 22:33:26.917791 C:\Users\Hong\Desktop\f1.txt
end to run 2 2018-09-16 22:33:27.073904
start to run 3 2018-09-16 22:33:27.073904
--> read start 2018-09-16 22:33:27.075903 C:\Users\Hong\Desktop\f2.txt
--> read end 2018-09-16 22:33:27.085896 C:\Users\Hong\Desktop\f2.txt
--> write start 2018-09-16 22:33:27.807891 C:\Users\Hong\Desktop\f2.txt
--> write end 2018-09-16 22:33:34.627992 C:\Users\Hong\Desktop\f2.txt
end to run 3 2018-09-16 22:33:34.746507
end 2018-09-16 22:33:34.762129
time elapse 0:00:21.832228

(py37) C:\Users\Hong\Desktop>
(py37) C:\Users\Hong\Desktop>
(py37) C:\Users\Hong\Desktop>
(py37) C:\Users\Hong\Desktop>python async.py
start 2018-09-16 22:33:50.945612
start to run 1 2018-09-16 22:33:50.948609
start to run 2 2018-09-16 22:33:50.953824
start to run 3 2018-09-16 22:33:50.953824
--> read start 2018-09-16 22:33:50.953824 C:\Users\Hong\Desktop\f0.txt
--> read start 2018-09-16 22:33:50.953824 C:\Users\Hong\Desktop\f1.txt
--> read start 2018-09-16 22:33:50.969449 C:\Users\Hong\Desktop\f2.txt
--> read end 2018-09-16 22:33:50.985078 C:\Users\Hong\Desktop\f0.txt
--> read end 2018-09-16 22:33:51.525238 C:\Users\Hong\Desktop\f1.txt
--> read end 2018-09-16 22:33:52.057857 C:\Users\Hong\Desktop\f2.txt
--> write start 2018-09-16 22:33:52.643887 C:\Users\Hong\Desktop\f0.txt
--> write start 2018-09-16 22:33:57.036816 C:\Users\Hong\Desktop\f1.txt
--> write start 2018-09-16 22:34:01.509756 C:\Users\Hong\Desktop\f2.txt
--> write end 2018-09-16 22:34:05.952100 C:\Users\Hong\Desktop\f0.txt
--> write end 2018-09-16 22:34:05.952100 C:\Users\Hong\Desktop\f1.txt
end to run 1 2018-09-16 22:34:06.105765
end to run 2 2018-09-16 22:34:06.206030
--> write end 2018-09-16 22:34:07.393667 C:\Users\Hong\Desktop\f2.txt
end to run 3 2018-09-16 22:34:07.525176
end 2018-09-16 22:34:07.525176
time elapse 0:00:16.579564

(py37) C:\Users\Hong\Desktop>

异步方法的运行速度比同步方法要快(分别为16.6s和21.8s),但是我期望异步运行的速度更快...当我们查看日志时,可以看到读取tmp文件实际上非常接近。

--> read end 2018-09-16 22:33:50.985078 C:\Users\Hong\Desktop\f0.txt
--> read end 2018-09-16 22:33:51.525238 C:\Users\Hong\Desktop\f1.txt
--> read end 2018-09-16 22:33:52.057857 C:\Users\Hong\Desktop\f2.txt

但是写开始没有结束

--> write start 2018-09-16 22:33:52.643887 C:\Users\Hong\Desktop\f0.txt
--> write start 2018-09-16 22:33:57.036816 C:\Users\Hong\Desktop\f1.txt
--> write start 2018-09-16 22:34:01.509756 C:\Users\Hong\Desktop\f2.txt

我期望每个任务的“写入开始”应该非常接近“读取结束”,因为Produce_content应该只花费很少的时间,但是为什么每个任务的“写入开始”却是如此不同?

谢谢

Hong

0 个答案:

没有答案