Question

考虑以下计划：

#!/usr/bin/env pypy

import json
import cStringIO
import sys

def main():
    BUFSIZE = 10240
    f = sys.stdin
    decoder = json.JSONDecoder()
    io = cStringIO.StringIO()

    do_continue = True
    while True:
        read = f.read(BUFSIZE)
        if len(read) < BUFSIZE:
            do_continue = False
        io.write(read)
        try:
            data, offset = decoder.raw_decode(io.getvalue())
            print(data)
            rest = io.getvalue()[offset:]
            if rest.startswith('\n'):
                rest = rest[1:]
            decoder = json.JSONDecoder()
            io = cStringIO.StringIO()
            io.write(rest)
        except ValueError, e:
            #print(e)
            #print(repr(io.getvalue()))
            continue
        if not do_continue:
            break

if __name__ == '__main__':
    main()

这是一个测试用例：

$ yes '{}' | pv | pypy  parser-test.py >/dev/null

如您所见，当您向其添加更多输入时，以下脚本会变慢。这也发生在cPython上。我试图使用mprof和cProfile来分析脚本，但我没有发现为什么会这样。有人有线索吗？

Answer 1

显然字符串操作会减慢它的速度。而不是：

        data, offset = decoder.raw_decode(io.getvalue())
        print(data)
        rest = io.getvalue()[offset:]
        if rest.startswith('\n'):
            rest = rest[1:]

最好这样做：

        data, offset = decoder.raw_decode(io.read())
        print(data)
        rest = io.getvalue()[offset:]
        io.truncate()
        io.write(rest)
        if rest.startswith('\n'):
            io.seek(1)

Answer 2

您可能希望在迭代结束时（写完后）关闭StringIO。

io.close()

StringIO的内存缓冲区一旦关闭就会释放，否则将保持打开状态。这可以解释为什么每个额外的输入都会减慢你的脚本速度。

为什么这个脚本随着输入量的增加而减慢每个项目的速度？

2 个答案: