Question

我试图从在线发布的日志文件中提取信息并通过输出读取。我真正需要的唯一信息发布在文件的末尾。这些文件非常大并且将整个套接字输出存储到变量中并且通过它读取消耗了大量内部存储器。是从底部到顶部读取插座吗？

我现在有什么：

socket = urllib.urlopen(urlString)
OUTPUT = socket.read()
socket.close()
OUTPUT = OUTPUT.split("\n")
for line in OUTPUT:
    if "xxxx" in line:
        print line

我正在使用Python 2.7。我非常希望从Socket的输出结尾读出大约30行。

Answer 1

此用例中您需要的是HTTP Range请求。这是我找到的教程：

https://docs.microsoft.com/en-us/azure/active-directory/active-directory-saas-custom-apps

我应该澄清一下：使用Head请求获取大小然后执行Range请求的优点是您不必传输所有内容。你提到你有相当大的文件资源，所以这将是最好的解决方案：）

编辑：在下方添加以下代码......

以下是该博客文章的演示（简化），但已翻译成Python。请注意，这不适用于所有HTTP服务器！更多评论内联：

"""
illustration of how to 'tail' a file using http. this will not work on all
webservers! if you need an http server to test with you can try the
rangehttpserver module:

$ pip install requests
$ pip install rangehttpserver
$ python -m RangeHTTPServer
"""
import requests

TAIL_SIZE = 1024

url = 'http://localhost:8000/lorem-ipsum.txt'
response = requests.head(url)

# not all servers return content-length in head, for some reason
assert 'content-length' in response.headers, 'Content length unknown- out of luck!'

# check the the resource length and construct a request header for that range
full_length = int(response.headers['content-length'])
assert full_length > TAIL_SIZE
headers = {
  'range': 'bytes={}-{}'.format( full_length - TAIL_SIZE, full_length)
}

# Make a get request, with the range header
response = requests.get(url, headers=headers)
assert 'accept-ranges' in response.headers, 'Accept-ranges response header missing'
assert response.headers['accept-ranges'] == 'bytes'
assert len(response.text) == TAIL_SIZE

# Otherwise you get the entire file
response = requests.get(url)
assert len(response.text) == full_length

在Python

1 个答案: