Question

我将整个HTTP响应作为一个字符串，但我想只提取正文。

我不想使用外部库或重新实现头解析。

Content-Type: text/xml
Content-Length: 129

<?xml version='1.0'?>
<methodResponse>
<params>
<param>
<value><boolean>0</boolean></value>
</param>
</params>
</methodResponse>
</code>

更新：如果不是很明显，我确实从其他来源获取数据而不是网址，因此任何尝试使用需要和URL的内容都是无用的。

我仍然会从对象data = stream.read()这样的流中读取数据，因此也可以使用可以使用流的解决方案。

第二次更新，是的，这是一个XMLRPC响应但是我正在使用不同的传输，所以我不能使用httplib来解析它，主要是因为httplib被破坏而且不接受字符串或者用于解析的流。

第3次更新，根据服务器，双重换行符可以是\r\n\r\n或\n\n。

所以要说清楚：输入是HTTP response that is supposed to contain an XMLRPC response and the output has to be the response。它不必解析XML，但必须能够从响应中正确提取XML。

Answer 1

基于Michal解决方案，但这包括并且必不可少的解决方案：

def strip_http_headers(http_reply):
    p = http_reply.find('\r\n\r\n')
    if p >= 0:
        return http_reply[p+4:]
    return http_reply

Answer 2

在HTTP响应中，标题与正文分隔两个CRLF字符。所以你可以像这样使用string.find()方法：

p = http_reply.find('\r\n\r\n')
if p >= 0:
    return http_reply[p:]
return http_reply

Answer 3

短而甜蜜：

body = response.split('\r\n\r\n', 1)[-1]

（它使用split()的两个参数版本而[-1]表示数组的最后一个元素）

Answer 4

resp = ('Content-Type: text/xml\r\n'
        'Content-Length: 129\r\n'
        "<?xml version='1.0'?>\r\n"
        '\r\n'
        '<methodResponse>\r\n'
        '<params>\r\n'
        '<param>\r\n'
        '<value><boolean>0</boolean></value>\r\n'
        '</param>\r\n'
        '</params>\r\n'
        '</methodResponse>\r\n'
        '</code>')

print resp.partition('\r\n\r\n')[2]

结果

<methodResponse>
<params>
<param>
<value><boolean>0</boolean></value>
</param>
</params>
</methodResponse>
</code>

在我的显示屏上，字符＆＃39; \ r＆＃39;在每一行的末尾显示为正方形。

partition（）的优点是它总是返回3个元素的元组：
然后，如果没有序列＆＃39; \ r \ n \ r \ n＆＃39;在文中，
resp.partition('\r\n\r\n')[2]将是"" 虽然split('\r\n\r\n')[1]导致错误，split('\r\n\r\n')[-1]是整个文本。

修改

如果双换行是可变的，只有正则表达式才能保持可变性有必要知道制作正则表达式模式的可变范围是什么。

假设只有＆＃34; \ n \ n＆＃34;，＆＃34; \ r \ n \ n＆n＆＃34;，＆＃34; \ n \ r \ n＆＃34;和＆＃34; \ r \ n \ r \ n＆＃34;是可能的，一个解决方案是根据以下模式在正则表达式的帮助下抓住身体：

import re

regx = re.compile('(?:(?:\r?\n){2}|\Z)(.+)?',re.DOTALL)

for ss in (('Content-Type: text/xml\r\n'
            'Content-Length: 129\r\n'
            "<?xml version='1.0'?>\n"
            '\n'
            'body1\r\n'
            '<params>\r\n'
            '<param>\r\n'
            '</code>') ,
           ('Content-Type: text/xml\r\n'
            'Content-Length: 129\r\n'
            "<?xml version='1.0'?>\r\n"
            '\n'
            'body2\r\n'
            '<params>\r\n'
            '<param>\r\n'
            '</code>') ,
           ('Content-Type: text/xml\r\n'
            'Content-Length: 129\r\n'
            "<?xml version='1.0'?>\n"
            '\r\n'
            'body3\r\n'
            '<params>\r\n'
            '<param>\r\n'
            '</code>') ,
           ('Content-Type: text/xml\r\n'
            'Content-Length: 129\r\n'
            "<?xml version='1.0'?>\r\n"
            '\r\n'
            'body4\r\n'
            '<params>\r\n'
            '<param>\r\n'
            '</code>') ,
           ('Content-Type: text/xml\r\n'
            'Content-Length: 129\r\r'
            "<?xml version='1.0'?>\r\r"
            '\r\n'
            'body4\r\n'
            '<params>\r\n'
            '<param>\r\n'
            '</code>') ,):
    print ('splitting on sequence  :  %r\n%r\n') \
           % (re.search('(?:\r?\n)+(?=body)',ss).group(),
              regx.search(ss).group(1))

结果

splitting on sequence  :  '\n\n'
'body1\r\n<params>\r\n<param>\r\n</code>'

splitting on sequence  :  '\r\n\n'
'body2\r\n<params>\r\n<param>\r\n</code>'

splitting on sequence  :  '\n\r\n'
'body3\r\n<params>\r\n<param>\r\n</code>'

splitting on sequence  :  '\r\n\r\n'
'body4\r\n<params>\r\n<param>\r\n</code>'

splitting on sequence  :  '\r\n'
None

Answer 5

您可以使用标准urllib2：

from urllib2 import urlopen
data = urlopen('http://url.here/').read()

如果你想解析xml：

from urllib2 import urlopen
from xml.dom.minidom import parse

xml = parse(urlopen('http://url.here'))

Answer 6

除了Tito所说的，还有requests包

>>> import requests
>>> r = requests.get("http://yoururl")
>>> r
<Response [200]>
>>> r.content
...

然后用minidom或你选择的任何工具解析它。

如何在Python中从包含整个响应的字符串中获取http响应的主体？

6 个答案:

修改