包装io.BufferedIOBase使其变为可搜索的

时间:2019-11-04 23:23:45

标签: python-3.x http pygame urllib

我试图回答有关来自HTTP服务器的流音频的问题,然后使用PyGame进行播放。我的代码大部分已经完成,但是在urllib.HTTPResponse对象上PyGame music functions试图seek()时遇到了错误。

根据urlib文档,urllib.HTTPResponse对象(自v3.5起)为io.BufferedIOBase。我希望这会使流seek()成为可能,但事实并非如此。

是否有一种包装io.BufferedIOBase的方法,使其足够聪明以缓冲足够的数据来处理搜索操作?

import pygame
import urllib.request
import io

# Window size
WINDOW_WIDTH  = 400
WINDOW_HEIGHT = 400
# background colour
SKY_BLUE      = (161, 255, 254)

### Begin the streaming of a file
### Return the urlib.HTTPResponse, a file-like-object
def openURL( url ):
    result = None

    try:
        http_response = urllib.request.urlopen( url )
        print( "streamHTTP() - Fetching URL [%s]" % ( http_response.geturl() ) )
        print( "streamHTTP() - Response Status [%d] / [%s]" % ( http_response.status, http_response.reason ) )
        result = http_response
    except:
        print( "streamHTTP() - Error Fetching URL [%s]" % ( url ) )

    return result


### MAIN
pygame.init()
window  = pygame.display.set_mode( ( WINDOW_WIDTH, WINDOW_HEIGHT ) )
pygame.display.set_caption("Music Streamer")


clock = pygame.time.Clock()
done = False
while not done:

    # Handle user-input
    for event in pygame.event.get():
        if ( event.type == pygame.QUIT ):
            done = True
    # Keys
    keys = pygame.key.get_pressed()
    if ( keys[pygame.K_UP] ):
        if ( pygame.mixer.music.get_busy() ):
            print("busy")
        else:
            print("play")
            remote_music = openURL( 'http://127.0.0.1/example.wav' )
            if ( remote_music != None and remote_music.status == 200 ):
                pygame.mixer.music.load( io.BufferedReader( remote_music ) )
                pygame.mixer.music.play()

    # Re-draw the screen
    window.fill( SKY_BLUE )

    # Update the window, but not more than 60fps
    pygame.display.flip()
    clock.tick_busy_loop( 60 )

pygame.quit()

运行此代码并按下 Up 时,它将失败,并显示以下错误:

streamHTTP() - Fetching URL [http://127.0.0.1/example.wav]
streamHTTP() - Response Status [200] / [OK]
io.UnsupportedOperation: seek
io.UnsupportedOperation: File or stream is not seekable.
io.UnsupportedOperation: seek
io.UnsupportedOperation: File or stream is not seekable.
Traceback (most recent call last):
  File "./sound_stream.py", line 57, in <module>
    pygame.mixer.music.load( io.BufferedReader( remote_music ) )
pygame.error: Unknown WAVE format

我还尝试重新打开io流,以及对同一事物的各种其他重新实现。

2 个答案:

答案 0 :(得分:5)

寻求寻求

  

根据urlib文档,urllib.HTTPResponse对象(自v3.5起)为io.BufferedIOBase。我希望这会使流seek()成为可能,但事实并非如此。

是的。 io.BufferedIOBase interface不保证I / O对象是可搜索的。对于HTTPResponse对象,IOBase.seekable()返回False

>>> import urllib.request
>>> response = urllib.request.urlopen("http://httpbin.org/get")
>>> response
<http.client.HTTPResponse object at 0x110870ca0>
>>> response.seekable()
False

这是因为BufferedIOBase提供的HTTPResponse实现包装了一个套接字对象和sockets are not seekable either

您不能将BufferedIOBase对象包装在BufferedReader对象中并添加寻求支持。 Buffered*包装对象只能包装RawIOBase类型,并且它们依靠包装的对象来提供寻求支持。您将不得不在原始I / O级别上模拟搜索,请参见下文。

您仍然可以在更高级别上提供相同的功能,但要考虑到寻求远程数据涉及更多;这不是简单的更改一个简单的OS变量,该变量代表磁盘上文件的位置。对于较大的远程文件数据,查找而不在本地将整个文件备份到磁盘可能与使用HTTP range requests和本地(在内存或磁盘上)缓冲区以平衡声音播放性能并最小化本地数据存储一样复杂。在广泛的用例中正确执行此操作可能会很费力,因此肯定不是Python标准库的一部分。

如果您的声音文件很小

如果基于HTTP的声音文件足够小(最多几个MB),则只需将整个响应读取到内存io.BytesIO()文件对象中即可。我真的不认为要比这更复杂,因为当您有足够的数据可以值得追求的那一刻时,您的文件就足够大到占用太多内存!

因此,如果您的声音文件较小(不超过几MB),那么绰绰有余

from io import BytesIO
import urllib.error
import urllib.request

def open_url(url):
    try:
        http_response = urllib.request.urlopen(url)
        print(f"streamHTTP() - Fetching URL [{http_response.geturl()}]")
        print(f"streamHTTP() - Response Status [{http_response.status}] / [{http_response.reason}]")
    except urllib.error.URLError:
        print("streamHTTP() - Error Fetching URL [{url}]")
        return

    if http_response.status != 200:
        print("streamHTTP() - Error Fetching URL [{url}]")
        return

    return BytesIO(http_response.read())

这不需要编写包装器对象,并且由于BytesIO是本机实现,因此,一旦完全复制数据,对数据的访问将比任何Python代码包装器所能提供的快。

请注意,这将返回一个BytesIO文件对象,因此您不再需要测试响应状态:

remote_music = open_url('http://127.0.0.1/example.wav')
if remote_music is not None:
    pygame.mixer.music.load(remote_music)
    pygame.mixer.music.play()

如果它们超过几个MB

一旦超出几兆字节,您可以尝试预加载数据到本地文件对象中。您可以使用线程使shutil.copyfileobj()在后台将大部分数据复制到该文件中,并在仅加载初始数据量后将该文件提供给PyGame,从而使操作更加复杂。

通过使用实际文件对象,您实际上可以在此处帮助提高性能,因为PyGame会尽量减少SDL混合器和文件数据之间的干扰。如果磁盘上有一个带有文件号(流的操作系统级别标识符,SDL混合器库可以利用的东西)的实际文件,则PyGame将直接在该文件上运行,从而最大程度地减少对GIL的阻止(转会帮助您提高游戏的Python部分的性能!)。而且,如果您传入文件名(只是一个字符串),则PyGame会完全摆脱麻烦,并将所有文件操作留给SDL库。

这是一个实现;这应该在正常的Python解释器退出时自动清理下载的文件。它返回一个文件名供PyGame处理,并在最初的几个KB缓冲后在线程中完成下载数据。它将避免多次加载相同的URL,并且使它成为线程安全的:

import shutil
import urllib.error
import urllib.request
from tempfile import NamedTemporaryFile
from threading import Lock, Thread

INITIAL_BUFFER = 1024 * 8  # 8kb initial file read to start URL-backed files
_url_files_lock = Lock()
# stores open NamedTemporaryFile objects, keeping them 'alive'
# removing entries from here causes the file data to be deleted.
_url_files = {}


def open_url(url):
    with _url_files_lock:
        if url in _url_files:
            return _url_files[url].name

    try:
        http_response = urllib.request.urlopen(url)
        print(f"streamHTTP() - Fetching URL [{http_response.geturl()}]")
        print(f"streamHTTP() - Response Status [{http_response.status}] / [{http_response.reason}]")
    except urllib.error.URLError:
        print("streamHTTP() - Error Fetching URL [{url}]")
        return

    if http_response.status != 200:
        print("streamHTTP() - Error Fetching URL [{url}]")
        return

    fileobj = NamedTemporaryFile()

    content_length = http_response.getheader("Content-Length")
    if content_length is not None:
        try:
            content_length = int(content_length)
        except ValueError:
            content_length = None
        if content_length:
            # create sparse file of full length
            fileobj.seek(content_length - 1)
            fileobj.write(b"\0")
            fileobj.seek(0)

    fileobj.write(http_response.read(INITIAL_BUFFER))
    with _url_files_lock:
        if url in _url_files:
            # another thread raced us to this point, we lost, return their
            # result after cleaning up here
            fileobj.close()
            http_response.close()
            return _url_files[url].name

        # store the file object for this URL; this keeps the file
        # open and so readable if you have the filename.
        _url_files[url] = fileobj

    def copy_response_remainder():
        # copies file data from response to disk, for all data past INITIAL_BUFFER
        with http_response:
            shutil.copyfileobj(http_response, fileobj)

    t = Thread(daemon=True, target=copy_response_remainder)
    t.start()

    return fileobj.name

BytesIO()解决方案一样,以上代码返回None或准备传递给pygame.mixer.music.load()的值。

如果您尝试立即在声音文件中设置高级播放位置,则上面的 可能不起作用,因为以后的数据可能尚未复制到文件中。这是一个权衡。

寻求和查找第三方库

如果您需要对远程URL的全面寻求支持,并且不想为它们使用磁盘上的空间,又不想担心它们的大小,则无需重新发明HTTP作为可搜索文件轮。您可以使用提供相同功能的现有项目。我发现两个提供基于io.BufferedIOBase的实现:

两者都使用HTTP Range请求来实现寻求支持。只需使用httpio.open(URL)smart_open.open(URL)并将其直接传递给pygame.mixer.music.load();如果无法打开该URL,则可以通过处理IOError异常来捕获该URL:

from smart_open import open as url_open  # or from httpio import open

try:
    remote_music = url_open('http://127.0.0.1/example.wav')
except IOError:
    pass
else:
    pygame.mixer.music.load(remote_music)
    pygame.mixer.music.play()

smart_open使用内存中的缓冲区来满足固定大小的读取,但是会为每个调用创建一个新的HTTP Range请求,以寻求更改当前文件的位置,因此性能可能会有所不同。由于SDL混合器会对音频文件执行一些搜索以确定它们的类型,所以我希望它会慢一些。

httpio可以缓冲数据块,因此可能会更好地处理查找,但是从源代码的简要介绍中可以看出,在实际设置缓冲区大小时,缓存的块不会再从内存中退出,因此您将结束最终将整个文件存储在内存中。

通过io.RawIOBase实现自我寻找

最后,由于我找不到HTTP范围支持的效率的实现,因此我编写了自己的实现。以下内容实现了io.RawIOBase接口,因此您可以将对象包装在io.BufferedIOReader()中,然后将缓存委托给在寻找时将被正确管理的缓存缓冲区:

import io
from copy import deepcopy
from functools import wraps
from typing import cast, overload, Callable, Optional, Tuple, TypeVar, Union
from urllib.request import urlopen, Request

T = TypeVar("T")

@overload
def _check_closed(_f: T) -> T: ...
@overload
def _check_closed(*, connect: bool, default: Union[bytes, int]) -> Callable[[T], T]: ...

def _check_closed(
    _f: Optional[T] = None,
    *,
    connect: bool = False,
    default: Optional[Union[bytes, int]] = None,
) -> Union[T, Callable[[T], T]]:
    def decorator(f: T) -> T:
        @wraps(cast(Callable, f))
        def wrapper(self, *args, **kwargs):
            if self.closed:
                raise ValueError("I/O operation on closed file.")
            if connect and self._fp is None or self._fp.closed:
                self._connect()
                if self._fp is None:
                    # outside the seekable range, exit early
                    return default
            try:
                return f(self, *args, **kwargs)
            except Exception:
                self.close()
                raise
            finally:
                if self._range_end and self._pos >= self._range_end:
                    self._fp.close()
                    del self._fp

        return cast(T, wrapper)

    if _f is not None:
        return decorator(_f)

    return decorator

def _parse_content_range(
    content_range: str
) -> Tuple[Optional[int], Optional[int], Optional[int]]:
    """Parse a Content-Range header into a (start, end, length) tuple"""
    units, *range_spec = content_range.split(None, 1)
    if units != "bytes" or not range_spec:
        return (None, None, None)
    start_end, _, size = range_spec[0].partition("/")
    try:
        length: Optional[int] = int(size)
    except ValueError:
        length = None
    start_val, has_start_end, end_val = start_end.partition("-")
    start = end = None
    if has_start_end:
        try:
            start, end = int(start_val), int(end_val)
        except ValueError:
            pass
    return (start, end, length)

class HTTPRawIO(io.RawIOBase):
    """Wrap a HTTP socket to handle seeking via HTTP Range"""

    url: str
    closed: bool = False
    _pos: int = 0
    _size: Optional[int] = None
    _range_end: Optional[int] = None
    _fp: Optional[io.RawIOBase] = None

    def __init__(self, url_or_request: Union[Request, str]) -> None:
        if isinstance(url_or_request, str):
            self._request = Request(url_or_request)
        else:
            # copy request objects to avoid sharing state
            self._request = deepcopy(url_or_request)
        self.url = self._request.full_url
        self._connect(initial=True)

    def readable(self) -> bool:
        return True

    def seekable(self) -> bool:
        return True

    def close(self) -> None:
        if self.closed:
            return
        if self._fp:
            self._fp.close()
            del self._fp
        self.closed = True

    @_check_closed
    def tell(self) -> int:
        return self._pos

    def _connect(self, initial: bool = False) -> None:
        if self._fp is not None:
            self._fp.close()
        if self._size is not None and self._pos >= self._size:
            # can't read past the end
            return
        request = self._request
        request.add_unredirected_header("Range", f"bytes={self._pos}-")
        response = urlopen(request)

        self.url = response.geturl()  # could have been redirected
        if response.status not in (200, 206):
            raise OSError(
                f"Failed to open {self.url}: "
                f"{response.status} ({response.reason})"
            )

        if initial:
            # verify that the server supports range requests. Capture the
            # content length if available
            if response.getheader("Accept-Ranges") != "bytes":
                raise OSError(
                    f"Resource doesn't support range requests: {self.url}"
                )
            try:
                length = int(response.getheader("Content-Length", ""))
                if length >= 0:
                    self._size = length
            except ValueError:
                pass

        # validate the range we are being served
        start, end, length = _parse_content_range(
            response.getheader("Content-Range", "")
        )
        if self._size is None:
            self._size = length
        if (start is not None and start != self._pos) or (
            length is not None and length != self._size
        ):
            # non-sensical range response
            raise OSError(
                f"Resource at {self.url} served invalid range: pos is "
                f"{self._pos}, range {start}-{end}/{length}"
            )
        if self._size and end is not None and end + 1 < self._size:
            # incomplete range, not reaching all the way to the end
            self._range_end = end
        else:
            self._range_end = None

        fp = cast(io.BufferedIOBase, response.fp)  # typeshed doesn't name fp
        self._fp = fp.detach()  # assume responsibility for the raw socket IO

    @_check_closed
    def seek(self, offset: int, whence: int = io.SEEK_SET) -> int:
        relative_to = {
            io.SEEK_SET: 0,
            io.SEEK_CUR: self._pos,
            io.SEEK_END: self._size,
        }.get(whence)
        if relative_to is None:
            if whence == io.SEEK_END:
                raise IOError(
                    f"Can't seek from end on unsized resource {self.url}"
                )
            raise ValueError(f"whence value {whence} unsupported")
        if -offset > relative_to:  # can't seek to a point before the start
            raise OSError(22, "Invalid argument")

        self._pos = relative_to + offset
        # there is no point in optimising an existing connection
        # by reading from it if seeking forward below some threshold.
        # Use a BufferedIOReader to avoid seeking by small amounts or by 0
        if self._fp:
            self._fp.close()
            del self._fp
        return self._pos

    # all read* methods delegate to the SocketIO object (itself a RawIO
    # implementation).

    @_check_closed(connect=True, default=b"")
    def read(self, size: int = -1) -> Optional[bytes]:
        assert self._fp is not None  # show type checkers we already checked
        res = self._fp.read(size)
        if res is not None:
            self._pos += len(res)
        return res

    @_check_closed(connect=True, default=b"")
    def readall(self) -> bytes:
        assert self._fp is not None  # show type checkers we already checked
        res = self._fp.readall()
        self._pos += len(res)
        return res

    @_check_closed(connect=True, default=0)
    def readinto(self, buffer: bytearray) -> Optional[int]:
        assert self._fp is not None  # show type checkers we already checked
        n = self._fp.readinto(buffer)
        self._pos += n or 0
        return n

请记住,这是一个RawIOBase对象,您确实希望将其包装在BufferReader()中。在open_url()中这样做如下:

def open_url(url, *args, **kwargs):
    return io.BufferedReader(HTTPRawIO(url), *args, **kwargs)

这为您提供了完全缓冲的I / O,并具有通过远程URL进行的全面支持查找,并且BufferedReader实现将在查找时最大程度地减少重置HTTP连接。我发现将其与PyGame混合器一起使用时,仅建立了一个HTTP连接,因为所有测试都在默认的8KB缓冲区内。

答案 1 :(得分:4)

如果您可以使用requests模块(支持流式传输)而不是urllib,则可以使用包装器like this

class ResponseStream(object):
    def __init__(self, request_iterator):
        self._bytes = BytesIO()
        self._iterator = request_iterator

    def _load_all(self):
        self._bytes.seek(0, SEEK_END)
        for chunk in self._iterator:
            self._bytes.write(chunk)

    def _load_until(self, goal_position):
        current_position = self._bytes.seek(0, SEEK_END)
        while current_position < goal_position:
            try:
                current_position = self._bytes.write(next(self._iterator))
            except StopIteration:
                break

    def tell(self):
        return self._bytes.tell()

    def read(self, size=None):
        left_off_at = self._bytes.tell()
        if size is None:
            self._load_all()
        else:
            goal_position = left_off_at + size
            self._load_until(goal_position)

        self._bytes.seek(left_off_at)
        return self._bytes.read(size)

    def seek(self, position, whence=SEEK_SET):
        if whence == SEEK_END:
            self._load_all()
        else:
            self._bytes.seek(position, whence)

那么我想你可以做这样的事情:

WINDOW_WIDTH  = 400
WINDOW_HEIGHT = 400
SKY_BLUE      = (161, 255, 254)
URL           = 'http://localhost:8000/example.wav'

pygame.init()
window  = pygame.display.set_mode( ( WINDOW_WIDTH, WINDOW_HEIGHT ) )
pygame.display.set_caption("Music Streamer")
clock = pygame.time.Clock()
done = False
font = pygame.font.SysFont(None, 32)
state = 0

def play_music():
    response = requests.get(URL, stream=True)
    if (response.status_code == 200):
        stream = ResponseStream(response.iter_content(64))
        pygame.mixer.music.load(stream)
        pygame.mixer.music.play()
    else:
        state = 0

while not done:

    for event in pygame.event.get():
        if ( event.type == pygame.QUIT ):
            done = True

        if event.type == pygame.KEYDOWN and state == 0:
            Thread(target=play_music).start()
            state = 1

    window.fill( SKY_BLUE )
    window.blit(font.render(str(pygame.time.get_ticks()), True, (0,0,0)), (32, 32))
    pygame.display.flip()
    clock.tick_busy_loop( 60 )

pygame.quit()

使用Thread开始流式传输。

我不确定这能否100%有效,但请尝试一下。