结合两个python脚本进行网络搜索

时间:2019-05-17 18:52:38

标签: python python-3.x

我正在尝试从网站下载文件,并且由于搜索结果的限制(最多300个),我需要分别搜索每个项目。我有一个具有完整列表的csv文件,我已经编写了一些基本代码来返回ID#列。

在一些帮助下,我有了另一个遍历每个搜索结果并下载文件的脚本。我现在要做的是将两者结合起来,以便它将搜索每个单独的ID#并下载文件。

我知道我的循环在这里弄乱了,我什至无法弄清楚我是否按正确的顺序循环了

import requests, json, csv

faciltiyList = []
with open('Facility List.csv', 'r') as f:
    csv_reader = csv.reader(f, delimiter=',')
    for searchterm in csv_reader:
        faciltiyList.append(searchterm[0])

        url = "https://siera.oshpd.ca.gov/FindFacility.aspx"
        r = requests.get(url+"?term="+str(searchterm))
        searchresults = json.loads(r.content.decode('utf-8'))
        for report in searchresults:
            rpt_id = report['RPT_ID']
            reporturl = f"https://siera.oshpd.ca.gov/DownloadPublicFile.aspx?archrptsegid={rpt_id}&reporttype=58&exportformatid=8&versionid=1&pageid=1"
            r = requests.get(reporturl)
            a = r.headers['Content-Disposition']
            filename = a[a.find("filename=")+9:len(a)]
            file = open(filename, "wb")
            file.write(r.content)
            r.close()

我的原始代码在这里:

import requests, json

searchterm="ALAMEDA (COUNTY)"

url="https://siera.oshpd.ca.gov/FindFacility.aspx"
r=requests.get(url+"?term="+searchterm)
searchresults=json.loads(r.content.decode('utf-8'))
for report in searchresults:
    rpt_id=report['RPT_ID']
    reporturl=f"https://siera.oshpd.ca.gov/DownloadPublicFile.aspx?archrptsegid={rpt_id}&reporttype=58&exportformatid=8&versionid=1&pageid=1"
    r=requests.get(reporturl)
    a=r.headers['Content-Disposition']
    filename=a[a.find("filename=")+9:len(a)]
    file = open(filename, "wb")
    file.write(r.content)
    r.close()

搜索词=“ ALAMEDA(COUNTY)”会产生300多个结果,因此,我尝试将“ ALAMEDA(COUNTY)”替换为通过每个名称(在这种情况下为ID#)运行的列表,我只会得到一个结果,然后再次运行列表中的下一个

1 个答案:

答案 0 :(得分:0)

CSV-仅1行

使用仅1行的CSV文件进行了测试:

406014324,"HOLISTIC PALLIATIVE CARE, INC.",550004188,Parent Facility,5707 REDWOOD RD,OAKLAND,94619,1,ALAMEDA,Not Applicable,,Open,1/1/2018,Home Health Agency/Hospice,Hospice,37.79996,-122.17075

Python代码

此脚本从CSV文件读取ID。然后,它从URL提取结果,最后将所需的内容写入磁盘。

import requests, json, csv

# read Ids from csv
facilityIds = []
with open('Facility List.csv', 'r') as f:
    csv_reader = csv.reader(f, delimiter=',')
    for searchterm in csv_reader:
        facilityIds.append(searchterm[0])

# fetch and write file contents
url = "https://siera.oshpd.ca.gov/FindFacility.aspx"
for facilityId in facilityIds:
  r = requests.get(url+"?term="+str(facilityId))
  reports = json.loads(r.content.decode('utf-8'))
  # print(f"reports = {reports}")
  for report in reports:
    rpt_id = report['RPT_ID']
    reporturl = f"https://siera.oshpd.ca.gov/DownloadPublicFile.aspx?archrptsegid={rpt_id}&reporttype=58&exportformatid=8&versionid=1&pageid=1"
    r = requests.get(reporturl)
    a = r.headers['Content-Disposition']
    filename = a[a.find("filename=")+9:len(a)]
    # print(f"filename = {filename}")
    with open(filename, "wb") as o:
      o.write(r.content)

Repl.it link