“喜欢readlines()”到Session对象(Python)

时间:2013-09-21 00:12:33

标签: python python-requests

我想从一些网页中挑选几行信息。我想(或者我)打开网页,遍历各行,检查每个关键字,找到我想要的信息。

页面需要会话。

def getpage()
    home = 'website'
    exstension1 = '/input/page'
    extension2 = '/output/page'
    indexnumber = '11100'

    sess = requests.Session()
    getter = sess.get(home+extension1)
    payload = {'foo':'bar','indexnumber':indexnumber}
    getter = sess.post(home+extension2,data=payload)

    return sess

正如我在标题中所说的那样,我需要一个.get()的readlines()方法

a.get(somePage)###Now could I put...###.readlines()
####or
a.get(somePage).text.readlines()###?
###I don't think I want the following, for performance reasons, correct me if I am wrong
F = open(someNewFile,mode='w')
F.write(a.get(somePage).text)
F.close()
F = open(thatFileIJustMade).readlines()###All that just to turn it into a File on which I can use readlines?

感谢

当我尝试

a.get(somePage).readlines()

我得到了

AttributeError: Response Object Doesn't have attribute readlines

3 个答案:

答案 0 :(得分:7)

有几种方法可以做到这一点,但最多的请求方式是使用流媒体请求和Response.iter_lines()

r = requests.get(somePage, stream=True)

for line in r.iter_lines(1024):
    # Do stuff on this line.

答案 1 :(得分:2)

超越@Lukasa的优秀且完全正确的方式,你也可以这样做:

import io

r = requests.get(some_page)
file_like_obj = io.StringIO(r.text)
lines = file_like_obj.readlines()

请注意,r.text绝对是在Response对象上使用的正确属性,因为在Python2上它将需要unicode,在Python 3上需要一个本机字符串(默认情况下为unicode)。

答案 2 :(得分:0)

从文档中,请记住:

警告

  

iter_lines()不是可重入的安全。多次调用此方法>会导致某些接收的数据丢失。如果您需要从多个地方调用它>,请使用生成的迭代器对象:

lines = r.iter_lines()
# Save the first line for later or just skip it

first_line = next(lines)

for line in lines:
    print(line)

为简单起见,我使用以下方法:

r = requests.get(somePage).text
r_lines = r.split("\n")

for line in r_lines:
    #line logic goes here