Question

我希望以自动方式执行以下操作：

转到此链接：https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/MCRAdvPartDEnrolData/Monthly-Enrollment-by-Contract-Plan-State-County-DL.xml
点击页面底部的链接（以当前年份和月份结束（即http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/MCRAdvPartDEnrolData/Monthly-Enrollment-by-Contract-Plan-State-County-Items/Monthly-Enrollment-by-CPSC-2016-04.html）
在下一页，从“下载”下方的顶部链接下载zip文件： CPSC每月注册 - 2016年4月[ZIP，20MB]

到目前为止，我有以下内容来获取当前年份和月份，但我需要其他方面的帮助......

from datetime import datetime
import calendar
Day = datetime.now().day
Month = datetime.now().month
Year = datetime.now().year
m=calendar.month_name[Month]

Answer 1

您需要一个XML解析器来从XML提要和HTML解析器中提取链接，以提取zip文件的链接。为此，我们将分别使用lxml.etree和lxml.html。工作实施：

from datetime import datetime
from urllib.request import urlretrieve
from urllib.parse import urljoin

import requests
from lxml import etree
from lxml import html


date_part = datetime.now().strftime("%Y-%m")
with requests.Session() as session:
    # get the XML feed and extract the link
    response = session.get("https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/MCRAdvPartDEnrolData/Monthly-Enrollment-by-Contract-Plan-State-County-DL.xml")
    root = etree.fromstring(response.content)
    link = root.xpath("//item/link[contains(., '-%s.html')]/text()" % date_part)[0]

    # follow the link and extract the link to the zip file
    response = session.get(link)
    root = html.fromstring(response.content)
    zip_link = root.xpath("//a[@type='application/zip']/@href")[0]
    link = urljoin(link, zip_link)

    # download zip
    urlretrieve(link, filename="my.zip")

Python请求转到链接和下载

1 个答案: