python中的web爬虫(多个网站)

时间:2016-07-29 03:23:08

标签: python html web-crawler

我使用了requestsbs4。在圈子里,我发现它只是最后一个汤#39;当我得到每一汤什么汤时,这是对的。'汤'汤与HTML源不同。请帮我。感谢。

for eachLine in files:
    addr = 'http://neuromorpho.org/neuron_info.jsp?neuron_name='+eachLine
    print addr
    st = []
    st1 = []
    r2 = requests.get(addr)
    soup2 = bs4.BeautifulSoup(r2.text,"lxml")
    print soup2

1 个答案:

答案 0 :(得分:0)

请求对象具有content参数,该参数包含站点的所有内容,您可以使用BS4解析它

for eachLine in files:
    addr = 'http://neuromorpho.org/neuron_info.jsp?neuron_name='+eachLine
    r2 = requests.get(addr)
    content = r2.content
    soup2 = bs4.BeautifulSoup(content)
    print soup2