我正在尝试解析HTTP请求行(例如' GET / HTTP / 1.1 \ r \ n '),这很容易使用socket.makefile()。readline() function(BaseHTTPRequestHandler使用它),如:
print sock.makefile().readline()
不幸的是,正如documentation所说,当使用makefile()时,套接字必须处于阻塞模式(它不能超时);我怎样才能实现一个readline() - 类似的函数,在不使用makefile()文件对象接口的情况下执行相同的操作,而不是读取超过需要的内容(因为它会丢弃我之后需要的数据)?
一个非常低效的例子:
request_line = ""
while not request_line.endswith('\n'):
request_line += sock.recv(1)
print request_line
答案 0 :(得分:2)
怎么样:
import StringIO
buff = StringIO.StringIO(2048) # Some decent size, to avoid mid-run expansion
while True:
data = sock.recv() # Pull what it can
buff.write(data) # Append that segment to the buffer
if '\n' in data: break # If that segment had '\n', break
# Get the buffer data, split it over newlines, print the first line
print buff.getvalue().splitlines()[0]
这种方法避免了非常昂贵的字符串构建。它还从套接字中提取尽可能多的数据。
答案 1 :(得分:2)
这是一个不使用asyncio
的(缓冲的)行阅读器。它可以用作 socket
的“同步”基于 asyncio.StreamReader
的替代品。
import socket
from asyncio import IncompleteReadError # only import the exception class
class SocketStreamReader:
def __init__(self, sock: socket.socket):
self._sock = sock
self._recv_buffer = bytearray()
def read(self, num_bytes: int = -1) -> bytes:
raise NotImplementedError
def readexactly(self, num_bytes: int) -> bytes:
buf = bytearray(num_bytes)
pos = 0
while pos < num_bytes:
n = self._recv_into(memoryview(buf)[pos:])
if n == 0:
raise IncompleteReadError(bytes(buf[:pos]), num_bytes)
pos += n
return bytes(buf)
def readline(self) -> bytes:
return self.readuntil(b"\n")
def readuntil(self, separator: bytes = b"\n") -> bytes:
if len(separator) != 1:
raise ValueError("Only separators of length 1 are supported.")
chunk = bytearray(4096)
start = 0
buf = bytearray(len(self._recv_buffer))
bytes_read = self._recv_into(memoryview(buf))
assert bytes_read == len(buf)
while True:
idx = buf.find(separator, start)
if idx != -1:
break
start = len(self._recv_buffer)
bytes_read = self._recv_into(memoryview(chunk))
buf += memoryview(chunk)[:bytes_read]
result = bytes(buf[: idx + 1])
self._recv_buffer = b"".join(
(memoryview(buf)[idx + 1 :], self._recv_buffer)
)
return result
def _recv_into(self, view: memoryview) -> int:
bytes_read = min(len(view), len(self._recv_buffer))
view[:bytes_read] = self._recv_buffer[:bytes_read]
self._recv_buffer = self._recv_buffer[bytes_read:]
if bytes_read == len(view):
return bytes_read
bytes_read += self._sock.recv_into(view[bytes_read:])
return bytes_read
用法:
reader = SocketStreamReader(sock)
line = reader.readline()
答案 2 :(得分:0)
这是我用Python 3编写的解决方案。在此示例中,我使用io.BytesIO.read()
而不是socket.recv()
,但想法是相同的
CHUNK_SIZE = 16 # you can set it larger or smaller
buffer = bytearray()
while True:
chunk = stream.read(CHUNK_SIZE)
buffer.extend(chunk)
if b'\n' in chunk or not chunk:
break
firstline = buffer[:buffer.find(b'\n')]
但是,消息的其余部分部分在缓冲区中,部分在套接字中等待。您可以继续将内容写到缓冲区中并从缓冲区中读取内容,以将整个请求合并为一个片段(除非您解析大量请求,否则应该没问题) 或者您可以用生成器将其包装起来,然后逐部分阅读
def reader(buffer, stream):
yield buffer[buffer.find(b'\n') + 1:]
while True:
chunk = stream.read(2048)
if not chunk: break
yield chunk