在python中进行网络抓取时需要一些帮助

时间:2020-08-07 15:41:58

标签: python web-scraping

我写了一段代码,该代码从每个代码(A代码,B代码,C代码等)的内部链接中抓取数据

如果您运行我的代码,它会抓取数据但未达到预期的效果,我的预期结果如下图所示

需要获取带有以下所有列名称和数据的cvs文件作为结果图像,即“组”“类别”“代码”“详细描述”“简短描述”

Scraping on this website (HCPCS Codes) [Result should be in this format with all fields as column name

这是代码!

    from bs4 import BeautifulSoup
    import requests
    import csv
    
    baseurl = requests.get("https://www.hcpcsdata.com/Codes").text
    
    baseurlhcpc = 'https://www.hcpcsdata.com'
    
    soup = BeautifulSoup(baseurl, 'lxml')
    
    #file = open('hcpccode3.csv', 'w')
    #writer = csv.writer(file)
    
    #writer.writerow(["hcpc code","description"])
    
    
    
    
    
    for table in soup.find_all('tr', class_='clickable-row'):
        hcpc_code = table.td.a.text
        #print(hcpc_code)
    
        description = table.find_all('td')[2].text.strip()
        print(description)
        #writer.writerow([hcpc_code, description])
    
    
    
    codelinks = soup.find_all('tr', class_='clickable-row')
    
    codelinksall = []
    
    
    for items in codelinks:
        for link in items.find_all('a', href=True):
            codelinksall.append(baseurlhcpc + link['href'])
    
    print(codelinksall)
    
    for link in codelinksall:
        r = requests.get(link)
        soup = BeautifulSoup(r.content, 'lxml')
    
        for table in soup.find_all('tr', class_='clickable-row'):
            codes = table.td.a.text
            description1 = table.find_all('td')[1].text.strip()
            print(codes, description1)

1 个答案:

答案 0 :(得分:1)

如果我了解您期望得到的结果,请按照以下步骤操作:

import requests
from bs4 import BeautifulSoup
import csv

response = requests.get(url="https://www.hcpcsdata.com/Codes")
print(response.status_code)

soup = BeautifulSoup(response.content, 'html.parser')
table = soup.find("div",{"class":"body-content"}).find("table",{"class":"table"}).find("tbody")
table_elements = table.find_all("tr",{"class":"clickable-row"})

elements_table = []

for i in table_elements:
  items = i.find_all("td")
  elements = []
  for i in items:
    elements.append(i.get_text().strip())
  elements_table.append(elements)

with open("table.csv","w+") as table:
  csv_writer = csv.writer(table, delimiter=",")
  for i in elements_table:
    csv_writer.writerow([*i])

它将返回此CSV文件:

'A' Codes,678,"Transportation Services Including Ambulance, Medical & Surgical Supplies"
'B' Codes,50,Enteral And Parenteral Therapy
'C' Codes,367,Temporary Codes For Use with Outpatient Prospective Payment System
'E' Codes,608,Durable Medical Equipment
'G' Codes,"1,736",Procedures / Professional Services (Temporary Codes)
'H' Codes,88,Alcohol and Drug Abuse Treatment Services / Rehab Services
'J' Codes,824,"Drugs Administered Other Than Oral Method, Chemotherapy Drugs"
'K' Codes,144,Durable Medical Equipment For Medicare Administrative Contractors
'L' Codes,904,"Orthotic And Prosthetic Procedures, Devices"
'M' Codes,117,Medical Services
'P' Codes,56,Pathology And Laboratory Services
'Q' Codes,359,Miscellaneous Services (Temporary Codes)
'R' Codes,3,Diagnostic Radiology Services
'S' Codes,526,Commercial Payers (Temporary Codes)
'T' Codes,109,Established For State Medical Agencies
'U' Codes,4,Coronavirus Diagnostic Panel
'V' Codes,209,"Vision, Hearing And Speech-Language Pathology Services"

希望我能帮助您!