Question

使用poster.encode模块，当我将整个文件发布到Solr时，这是有效的：

f = open(filePath, 'rb')
datagen, headers = multipart_encode({'file': f})

# use wt=json because it's more convenient to navigate    
request = urllib2.Request(SOLR_BASE_URL + 'update/extract?extractOnly=true&extractFormat=text&indent=true&wt=json', datagen, headers)   # assumes solrPath ends in '/'
extracted = urllib2.urlopen(request).read()

但是，对于某些文件，我只想发送文件的前n个字节。我认为这样可行：

f = open(filePath, 'rb')    
mp = MultipartParam('file', fileobj=f, filesize=f)
datagen, headers = multipart_encode({'file': mp})

# use wt=json because it's more convenient to navigate    
request = urllib2.Request(SOLR_BASE_URL + 'update/extract?extractOnly=true&extractFormat=text&indent=true&wt=json', datagen, headers)   # assumes solrPath ends in '/'
extracted = urllib2.urlopen(request).read()

...但我得到了一个超时请求（奇怪的是，我必须重新启动apache才能再次请求我的web2py应用程序工作）。当我离开filesize参数时，我从urlopen（）得到'http 400 content missing'错误。我只是错误地使用MultipartParam吗？

（所有这一切的重点在于我正在使用Solr从文件中提取文本内容和元数据。对于视频和音频文件，我只想发送前100-300k左右，如同大概相关的数据都在文件头中。）

Answer 1

你遇到麻烦的原因是mime编码在帖子中引入了哨兵，如果你没有指定文件大小 - 这意味着你必须做chunked transfer encoding以便网络服务器知道何时到停止阅读文件。但是，这是另一个问题 - 如果你停止向服务器中流发送MIME编码的POST，它将只是坐在那里等待块完成。归结为消息段大小，Chunked transfer encoding和mixed-multipart mime encoding都是严重的。

如果您只想发送100-300k的数据，那么只读取那么多内容，那么您对服务器发送的每个帖子都将以您想要的字节和Web服务器所期望的结尾终止。

仅使用Python poster.encode发布文件的一部分

1 个答案: