从HTML解析元素

时间:2017-08-07 18:55:58

标签: python html xml parsing

我希望按州创建动物医院的csv文件。我认为我选择的HTML是不正确的。我想迭代选择正确标签的元素来解析状态,名称,地址,电话号码。

from lxml import html
import requests

link = "https://vcahospitals.com/find-a-hospital/location-directory"
response = requests.get(link, allow_redirects = False) #get page data from server, block redirects
sourceCode = response.content #get string of source code from response
htmlElem = html.document_fromstring(sourceCode) #make HTML element object

print(sourceCode)

[示例页面html。我已经尝试选择所有div元素作为类] [1]

I would think this grabs all the state hospitals, but it only prints out one state's worth

1 个答案:

答案 0 :(得分:1)

您在代码中缩进了print语句。

for el in state_hospitals:
    text = el.text_content()

    # indented in the for block.
    print (text)