使用python将网站数据解析为文件

时间:2020-01-28 21:09:34

标签: python html json csv

我正在尝试解析

的信息

[https://www.gsaelibrary.gsa.gov/ElibMain/contractorInfo.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64?contractNumber=V797P-2045D&contractorName=AFFIRMATIVE+SOLUTIONS%2C+LLC&executeQuery=YES][1]

需要格式的输出 CSV

合同号,承包商地址电话电子邮件:网址:DUNS:社会经济:EPLS:政府。联系人:电话电子邮件:

或在json中 { “合同编号”:V797P-2045D, .... }

代码使用

from bs4 import BeautifulSoup
import urllib
import urllib.request
url = "https://www.gsaelibrary.gsa.gov/ElibMain/contractorInfo.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64?contractNumber=V797P-2045D&contractorName=AFFIRMATIVE+SOLUTIONS%2C+LLC&executeQuery=YES"
response = urllib.request.urlopen(url).read()
soup = BeautifulSoup(response)
tables = soup.find('body').find_all('table')
print (tables[10].text.strip())

表[10]返回

<html><head><title>GSA eLibrary Contractor Information</title>
<meta content="text/html; charset=utf-8" http-equiv="content-type"/><link href="images/content.css" rel="stylesheet" type="text/css"/>
<meta content="MSHTML 5.50.4207.2601" name="generator"/></head>
<body bgcolor="#ffffff" leftmargin="0" link="#990000" marginheight="0" marginwidth="0" topmargin="0" vlink="#660000">
<script id="_fed_an_ua_tag" language="javascript" src="js/Universal-Federated-Analytics.1.04.js?agency=GSA&amp;sp=searchText,tcSearchText"></script>
<script src="js/jquery.js" type="text/javascript"></script>
<script src="js/jquery-migrate-3.0.1.js" type="text/javascript"></script>
<script src="js/jquery.autocomplete.js" type="text/javascript"></script>
<script src="js/jquery.bgiframe.min.js" type="text/javascript"></script>
<link href="css/autocomplete.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript">
jQuery(function(){
var callAction="/ElibMain/autoComplete";
    $("#searchText").autocomplete(callAction, {
            max: 15,
      minChars: 2,
      matchSubset: false,
      scroll: false,
      selectFirst: false
    });
})
</script>
<table bgcolor="#f0f0f0" border="0" cellpadding="0" cellspacing="0" width="98%">
<tr>
<td>
<table bgcolor="#ffffff" border="0" cellpadding="0" cellspacing="0" width="100%">
<tr>
<td align="center" valign="bottom" width="250"><a href="#skipnavigation"><img border="0" height="1" src="images/spacer.gif" title="skip to content" width="1"/></a><a href="home.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64"><img border="0" height="41" src="images/eLibrary_logo.gif" title="GSA eLibrary" width="240"/></a></td>
<td align="left" valign="bottom">
<font color="#C0C0C0" size="2"><strong>GSA Federal Acquisition Service</strong></font>
</td>
<td align="right" valign="bottom">
<table align="right" border="0" cellpadding="0" cellspacing="0" width="100%">
<tr>
<td align="right">
<table align="right" border="0" cellpadding="1" cellspacing="0">
<tr>
<td><a href="home.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64"><img border="0" height="21" src="images/eLib_ban_home.gif" title="home" width="50"/></a></td>
<td>
<a href="advRedirect.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64?app=ebuy&amp;source=elibrary">
<img border="0" height="21" src="images/eLib_ban_eBuy.gif" title="eBuy - quotes" width="94"/></a></td>
<td><a href="advRedirect.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64"><img border="0" height="21" src="images/eLib_ban_advantage.gif" title="GSA Advantage - online shopping" width="216"/></a></td>
<td><a href="https://www.gsaadvantage.gov/images/products/elib/pdf_files/elibhp.pdf" target="_blank"><img border="0" height="21" src="images/eLib_ban_help.gif" title="Help on eLibrary" width="41"/></a></td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
<table bgcolor="#f0f0f0" border="0" cellpadding="0" cellspacing="0" width="98%">
<tr>
<td align="right">
<table bgcolor="#003265" cellpadding="1" cellspacing="0" width="100%">
<tr>
<td>
<table bgcolor="#f0f0f0" border="0" cellpadding="0" cellspacing="0" width="100%">
<form action="searchResults.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64" method="get" name="search">
<tr>
<td>
<table bgcolor="#f0f0f0" border="0" cellpadding="2" cellspacing="0" width="100%">
<tr>
<td align="right"><span class="FormLabel">Search:<input class="dropdown" id="searchText" name="searchText" size="20" value=""/>
<select class="dropdown" name="searchType" size="1">
<option value="allWords">all the words</option>
<option value="anyWords">any of the words</option>
<option value="exactWords">exact phrase</option>
</select><input align="top" border="0" src="images/go_elib.gif" title="Go" type="image" value="doSearch"/> </span></td>
</tr>
</table>
</td>
</tr>
</form>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
<!-- End search bar -->
<!--  Added for SCR 8157 Start -->
<!--  Added for SCR 8157 end -->
<table border="0" cellpadding="0" cellspacing="0" width="98%">
<tr>
<td><img border="0" height="1" src="images/blank.gif" title="" width="10"/></td>
<td valign="top">
<table border="0" cellpadding="2" cellspacing="0" width="98%">
<tr>
<td><a name="skipnavigation"></a><img border="0" height="37" src="images/title_contractor_info.gif" title="Contractor Information" width="186"/> </td>
<td align="right"><font size="1"><a href="https://www.gsaadvantage.gov/images/products/elib/pdf_files/elibhp.pdf#chngcntrinfo" target="_blank">(Vendors) How to change your company information</a></font></td>
</tr>
</table>
<table border="1" bordercolor="#cccccc" cellpadding="2" cellspacing="0" width="100%">
<tr>
<td bgcolor="#f0f0f0">
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tr valign="top">
<td width="60%">
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tr>
<td><font class="columntitle" size="2">Contract #:</font></td>
<td><font size="2">

                                                        V797P-2045D

                                                    </font>
</td>
</tr>
<tr>
<td><font class="columntitle" size="2">Contractor:</font></td>
<td><font size="2">AFFIRMATIVE SOLUTIONS, LLC


                                                    </font></td></tr>
<tr valign="top">
<td><font class="columntitle" size="2">Address:</font></td>
<td><font size="2">103B KINGSBRIDGE DRIVE<br/>CARROLLTON, GA 30117</font></td></tr>
<tr>
<td><font class="columntitle" size="2">Phone:</font></td>
<td><font size="2">8669947986</font></td></tr>
<tr>
<td><font class="columntitle" size="2">E-Mail:</font></td>
<td><font size="2"><a href="mailto:billy.williams@affirmativesolutions.org " title="link opens email message">billy.williams@affirmativesolutions.org</a></font></td></tr>
<tr>
<td><font class="columntitle" size="2">Web Address:</font></td>
<td><font size="2"><a href="http://www.affirmativesolutions.org/index.php" target="_blank">http://www.affirmativesolutions.org/index.php</a></font></td>
</tr>
<tr>
<td><font class="columntitle" size="2">DUNS:</font></td>
<td><font size="2">826891405</font></td>
</tr>
<tr>
</tr>
</table>
</td>
<td>
<table border="0" cellpadding="2" cellspacing="0" width="100%">
<tr>
<!-- <td width=40%>&nbsp;</td>-->
<td align="left" nowrap="" valign="top" width="15%"><font class="columntitle" size="2">Socio-Economic :</font></td>
<td align="left" nowrap="" width="15%"><font size="1">Small business<br/>Service Disabled Veteran Owned Small business<br/></font>
</td>
</tr>
<tr>
<!-- <td width=40%>&nbsp;</td>-->
<td align="left" nowrap="" valign="top" width="15%"><font class="columntitle" size="2">EPLS : </font></td>
<td align="left" nowrap=""><font size="1">Contractor not found on the Excluded Parties List System</font></td>
</tr>
</table>
<table border="0" cellpadding="2" cellspacing="0" width="100%">
<tr>
<td><font class="columntitle" size="1">Govt. Point of Contact:</font><br/><font size="1">TINA BUTCHER-JOHNSON
                                                        <br/><font class="columntitle" size="1">Phone: </font><font size="1">(708)786-7722 </font>
<br/><font class="columntitle" size="1">E-Mail: </font><font size="1"><a href="mailto:tina.butcher-johnson@va.gov" title="link opens email message">tina.butcher-johnson@va.gov</a></font> </font>
</td>
</tr>
<!-- Added for SCR 8157 Start  -->
<tr>
</tr>
<!-- Added for SCR 8157 End  -->
</table>
</td>
</tr>
</table>
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tr>
<td>
<table border="1" bordercolor="#ffffff" cellpadding="2" cellspacing="0" width="100%">
<tr bgcolor="#ffffff">
<td align="middle" valign="bottom"><font class="columntitle" size="1">Source</font></td>
<td align="middle" valign="bottom"><font class="columntitle" size="1">Title</font></td>
<td align="middle" valign="bottom"><font class="columntitle" size="1">Contract<br/>Number</font></td>
<td align="middle" valign="bottom"><font class="columntitle" size="1">Contractor T&amp;Cs<br/>/Pricelist</font></td>
<td align="middle" valign="bottom"><font class="columntitle" size="1">Contract End Date</font></td>
<td align="middle" valign="bottom"><font class="columntitle" size="1">Category</font></td>
<td align="middle" valign="bottom"><font class="columntitle" size="2">  </font></td>
<td align="middle" nowrap="" valign="bottom"><font class="columntitle" size="1">View Catalog</font></td>
</tr>
<tr align="left" bgcolor="#f0f0f0" valign="top">
<!--  Schedule Num Column -->
<td align="middle" nowrap="">
<font size="2"><a href="scheduleSummary.do?scheduleNumber=65+II+A">65 II A</a></font><br/>
</td>
<!--  Schedule Desc Column -->
<td align="justify">
<font size="1"><font class="columntitle" size="1">MEDICAL EQUIPMENT AND SUPPLIES</font></font>
</td>
<!--  Contract Num Column -->
<td align="middle" nowrap="">
<font size="1"><a href="contractorInfo.do?contractNumber=V797P-2045D&amp;contractorName=AFFIRMATIVE+SOLUTIONS%2C+LLC&amp;executeQuery=NO">V797P-2045D</a></font>
</td>
<!--  Text File Column -->
<td align="middle">
<a href="https://www.gsaadvantage.gov/ref_text/V797P2045D/V797P2045D_online.htm" target="_blank"><img border="0" height="16" src="images/vend_details.gif" title="View Contractors T&amp;Cs/Pricelist" width="16"/></a>
</td>
<!--  Contract end date Column -->
<td align="middle" nowrap=""><font size="1">Nov 30, 2021</font></td>
<!--  Sin Column -->
<td align="middle" valign="top">
<table border="0" cellpadding="2" cellspacing="2">
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><font size="1"><a href="sinDetails.do?scheduleNumber=65+II+A&amp;specialItemNumber=A-18A&amp;executeQuery=YES">A-18A</a></font></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><font size="1"><a href="sinDetails.do?scheduleNumber=65+II+A&amp;specialItemNumber=A-20C&amp;executeQuery=YES">A-20C</a></font></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><font size="1"><a href="sinDetails.do?scheduleNumber=65+II+A&amp;specialItemNumber=A-25B&amp;executeQuery=YES">A-25B</a></font></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><font size="1"><a href="sinDetails.do?scheduleNumber=65+II+A&amp;specialItemNumber=A-2B&amp;executeQuery=YES">A-2B</a></font></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><font size="1"><a href="sinDetails.do?scheduleNumber=65+II+A&amp;specialItemNumber=A-33A&amp;executeQuery=YES">A-33A</a></font></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><font size="1"><a href="sinDetails.do?scheduleNumber=65+II+A&amp;specialItemNumber=A-33B&amp;executeQuery=YES">A-33B</a></font></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><font size="1"><a href="sinDetails.do?scheduleNumber=65+II+A&amp;specialItemNumber=A-4A&amp;executeQuery=YES">A-4A</a></font></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><font size="1"><a href="sinDetails.do?scheduleNumber=65+II+A&amp;specialItemNumber=A-50H&amp;executeQuery=YES">A-50H</a></font></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><font size="1"><a href="sinDetails.do?scheduleNumber=65+II+A&amp;specialItemNumber=A-72&amp;executeQuery=YES">A-72</a></font></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><font size="1"><a href="sinDetails.do?scheduleNumber=65+II+A&amp;specialItemNumber=A-89&amp;executeQuery=YES">A-89</a></font></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><font size="1"><a href="sinDetails.do?scheduleNumber=65+II+A&amp;specialItemNumber=A-8A&amp;executeQuery=YES">A-8A</a></font></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><font size="1"><a href="sinDetails.do?scheduleNumber=65+II+A&amp;specialItemNumber=A-8B&amp;executeQuery=YES">A-8B</a></font></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><font size="1"><a href="sinDetails.do?scheduleNumber=65+II+A&amp;specialItemNumber=A-90A&amp;executeQuery=YES">A-90A</a></font></td>
</tr>
</table>
</td>
<!--  stloc/recstloc Column -->
<td align="middle" valign="top">
<table border="0" cellpadding="2" cellspacing="2">
<tr>
<td height="25" nowrap="" valign="top">
</td>
</tr>
<tr>
<td height="25" nowrap="" valign="top">
</td>
</tr>
<tr>
<td height="25" nowrap="" valign="top">
</td>
</tr>
<tr>
<td height="25" nowrap="" valign="top">
</td>
</tr>
<tr>
<td height="25" nowrap="" valign="top">
</td>
</tr>
<tr>
<td height="25" nowrap="" valign="top">
</td>
</tr>
<tr>
<td height="25" nowrap="" valign="top">
</td>
</tr>
<tr>
<td height="25" nowrap="" valign="top">
</td>
</tr>
<tr>
<td height="25" nowrap="" valign="top">
</td>
</tr>
<tr>
<td height="25" nowrap="" valign="top">
</td>
</tr>
<tr>
<td height="25" nowrap="" valign="top">
</td>
</tr>
<tr>
<td height="25" nowrap="" valign="top">
</td>
</tr>
<tr>
<td height="25" nowrap="" valign="top">
</td>
</tr>
</table>
</td>
<!--  Adv Item Column -->
<td align="middle" nowrap="" valign="top">
<table border="0" cellpadding="2" cellspacing="2">
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><a href="advRedirect.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64?contract=V797P-2045D&amp;sin=A-18A&amp;src=elib&amp;app=cat"><img border="0" height="17" src="images/adv_sm.gif" title="GSA Advantage!" width="69"/></a></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><a href="advRedirect.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64?contract=V797P-2045D&amp;sin=A-20C&amp;src=elib&amp;app=cat"><img border="0" height="17" src="images/adv_sm.gif" title="GSA Advantage!" width="69"/></a></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><a href="advRedirect.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64?contract=V797P-2045D&amp;sin=A-25B&amp;src=elib&amp;app=cat"><img border="0" height="17" src="images/adv_sm.gif" title="GSA Advantage!" width="69"/></a></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><a href="advRedirect.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64?contract=V797P-2045D&amp;sin=A-2B&amp;src=elib&amp;app=cat"><img border="0" height="17" src="images/adv_sm.gif" title="GSA Advantage!" width="69"/></a></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><a href="advRedirect.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64?contract=V797P-2045D&amp;sin=A-33A&amp;src=elib&amp;app=cat"><img border="0" height="17" src="images/adv_sm.gif" title="GSA Advantage!" width="69"/></a></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><a href="advRedirect.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64?contract=V797P-2045D&amp;sin=A-33B&amp;src=elib&amp;app=cat"><img border="0" height="17" src="images/adv_sm.gif" title="GSA Advantage!" width="69"/></a></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><a href="advRedirect.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64?contract=V797P-2045D&amp;sin=A-4A&amp;src=elib&amp;app=cat"><img border="0" height="17" src="images/adv_sm.gif" title="GSA Advantage!" width="69"/></a></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><a href="advRedirect.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64?contract=V797P-2045D&amp;sin=A-50H&amp;src=elib&amp;app=cat"><img border="0" height="17" src="images/adv_sm.gif" title="GSA Advantage!" width="69"/></a></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><a href="advRedirect.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64?contract=V797P-2045D&amp;sin=A-72&amp;src=elib&amp;app=cat"><img border="0" height="17" src="images/adv_sm.gif" title="GSA Advantage!" width="69"/></a></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><a href="advRedirect.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64?contract=V797P-2045D&amp;sin=A-89&amp;src=elib&amp;app=cat"><img border="0" height="17" src="images/adv_sm.gif" title="GSA Advantage!" width="69"/></a></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><a href="advRedirect.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64?contract=V797P-2045D&amp;sin=A-8A&amp;src=elib&amp;app=cat"><img border="0" height="17" src="images/adv_sm.gif" title="GSA Advantage!" width="69"/></a></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><a href="advRedirect.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64?contract=V797P-2045D&amp;sin=A-8B&amp;src=elib&amp;app=cat"><img border="0" height="17" src="images/adv_sm.gif" title="GSA Advantage!" width="69"/></a></td>
</tr>
<tr align="left" valign="top">
<td align="middle" height="25" nowrap="" valign="top"><a href="advRedirect.do;jsessionid=PVvYkVSJSkfs28uit+eBgbZu.prd1pweb64?contract=V797P-2045D&amp;sin=A-90A&amp;src=elib&amp;app=cat"><img border="0" height="17" src="images/adv_sm.gif" title="GSA Advantage!" width="69"/></a></td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
<br/><br/>
</body>
</html>

我打算编写一个正则表达式和其他字符串函数,以将数据转换为Json或csv格式。在csv或json中安装一个简单的函数即可实现相同功能

如果可以轻松制作此html表,请提供帮助

0 个答案:

没有答案