如何使用python通过多个下拉菜单进行网络爬虫?

时间:2019-04-24 14:13:57

标签: python web-crawler

这是一个有关记录空气污染数据的网站。该数据是几年的每小时数据。有多个下拉菜单,例如位置(例如,台北市),年,月和日。我写了到目前为止所知道的代码,并附在下面。我想知道如何选择一个位置来抓取网站上的所有数据。我还在下面附加了Web内容,我的目标是在最后20行内容中抓取数据。

    import requests

    from bs4 import BeautifulSoup as bs

    res = requests.post('https://erdb.epa.gov.tw/DataRepository/Air/Flue_CEMS_DATA.aspx')
    soup = BeautifulSoup(res.text, 'lxml')
    table = soup.find_all(text='台北市')[0].parent.parent.parent
    for x in range(1,51):
      for y in range(0,13):
        data1 = table.select('tr')[x].select('td')[y].text
        print(data1)

我希望我可以将所有数据抓取到所选位置。以下是网站的内容。我除了在最后20行中抓取信息。

   <select name="ctl00$ContentPlaceHolder1$ucSearchCondition$ddlEPB" id="ctl00_ContentPlaceHolder1_ucSearchCondition_ddlEPB">
        <option selected="selected" value="臺北市">臺北市</option>
        <option value="新北市">新北市</option>
        <option value="基隆市">基隆市</option>
        <option value="桃園市">桃園市</option>
        <option value="新竹市">新竹市</option>
        <option value="新竹縣">新竹縣</option>
        <option value="苗栗縣">苗栗縣</option>
        <option value="臺中市">臺中市</option>
        <option value="彰化縣">彰化縣</option>
        <option value="雲林縣">雲林縣</option>
        <option value="南投縣">南投縣</option>
        <option value="嘉義市">嘉義市</option>
        <option value="嘉義縣">嘉義縣</option>
        <option value="臺南市">臺南市</option>
        <option value="高雄市">高雄市</option>
        <option value="屏東縣">屏東縣</option>
        <option value="宜蘭縣">宜蘭縣</option>
        <option value="花蓮縣">花蓮縣</option>
        <option value="臺東縣">臺東縣</option>
        <option value="澎湖縣">澎湖縣</option>
        <option value="連江縣">連江縣</option>
        <option value="金門縣">金門縣</option>

    </select>
                        </td>
                    </tr>
                    <tr>
                        <td>日期:</td>
                        <td>
                            <select name="ctl00$ContentPlaceHolder1$ucSearchCondition$ddlYearS" id="ctl00_ContentPlaceHolder1_ucSearchCondition_ddlYearS">
        <option selected="selected" value="2019">2019</option>
        <option value="2018">2018</option>
        <option value="2017">2017</option>
        <option value="2016">2016</option>
        <option value="2015">2015</option>
        <option value="2014">2014</option>
        <option value="2013">2013</option>
        <option value="2012">2012</option>
        <option value="2011">2011</option>
        <option value="2010">2010</option>
        <option value="2009">2009</option>
        <option value="2008">2008</option>
        <option value="2007">2007</option>
        <option value="2006">2006</option>
        <option value="2005">2005</option>
        <option value="2004">2004</option>

    </select>年
                            <select name="ctl00$ContentPlaceHolder1$ucSearchCondition$ddlMonthS" id="ctl00_ContentPlaceHolder1_ucSearchCondition_ddlMonthS">
        <option value="01">01</option>
        <option value="02">02</option>
        <option value="03">03</option>
        <option selected="selected" value="04">04</option>
        <option value="05">05</option>
        <option value="06">06</option>
        <option value="07">07</option>
        <option value="08">08</option>
        <option value="09">09</option>
        <option value="10">10</option>
        <option value="11">11</option>
        <option value="12">12</option>

    </select>月
                            <select name="ctl00$ContentPlaceHolder1$ucSearchCondition$ddlDayS" id="ctl00_ContentPlaceHolder1_ucSearchCondition_ddlDayS">
        <option value="01">01</option>
        <option value="02">02</option>
        <option value="03">03</option>
        <option value="04">04</option>
        <option value="05">05</option>
        <option value="06">06</option>
        <option value="07">07</option>
        <option value="08">08</option>
        <option value="09">09</option>
        <option value="10">10</option>
        <option value="11">11</option>
        <option value="12">12</option>
        <option value="13">13</option>
        <option value="14">14</option>
        <option value="15">15</option>
        <option value="16">16</option>
        <option selected="selected" value="17">17</option>
        <option value="18">18</option>
        <option value="19">19</option>
        <option value="20">20</option>
        <option value="21">21</option>
        <option value="22">22</option>
        <option value="23">23</option>
        <option value="24">24</option>
        <option value="25">25</option>
        <option value="26">26</option>
        <option value="27">27</option>
        <option value="28">28</option>
        <option value="29">29</option>
        <option value="30">30</option>

    </select>日
                        </td>
                    </tr>
                    <tr style="display:none;">
                        <td>日期區間(迄):</td>
                        <td>

                            <span id="ctl00_ContentPlaceHolder1_ucSearchCondition_lbYearE">2019</span>年
                            <span id="ctl00_ContentPlaceHolder1_ucSearchCondition_lbMonthE">04</span>月
                            <span id="ctl00_ContentPlaceHolder1_ucSearchCondition_lbDayE">17</span>日
                          </td>
                    </tr>

                </table>


</div>
    </div>
</div>


                <input type="image" name="ctl00$ContentPlaceHolder1$imgSearch" id="ctl00_ContentPlaceHolder1_imgSearch" src="../../Resource/images/search.png" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;ctl00$ContentPlaceHolder1$imgSearch&quot;, &quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, false))" border="0" />
            </div>

            <div id="description" style="float: right; width: 57%;">

<link href="/Resource/css/MetaStyle.css" rel="stylesheet" type="text/css" />
    <div>
        <table class="ExplainBox" id="tbMetaData" border="0">
            <tr>
                <td style="width:200px" class="title-r">資料集名稱</td>
                <td>
                    <span id="ctl00_ContentPlaceHolder1_ucMetaData_lblDataSetName">固定污染源CEMS監測數據紀錄值資料集</span>
                </td>
            </tr>
            <tr>
                <td class="title-r">資料集描述</td>
                <td>
                    <span id="ctl00_ContentPlaceHolder1_ucMetaData_lblMetaDesc">本資料集收錄CEMS監測數據紀錄值資料,因資料整備特性,提供七日前資料。</span>
                </td>
            </tr>
            <tbody id="ctl00_ContentPlaceHolder1_ucMetaData_tbodyMIS" align="left">
                <tr>
                    <td class="auto-style1 title-r">主要欄位說明</td>
                    <td class="auto-style1">
                        <span id="ctl00_ContentPlaceHolder1_ucMetaData_lblFieldDesc">所屬環保局(Epb)、管制編號(CNO)、公司簡稱(Abbr)、煙囪序號(PolNo)、監測項目名稱(ItemDesc)、監測項目編號(Item)、監測時間(M_Time)、監測數值(M_Val)、排放標準值(Std)、單位(Unit)、資料辨識碼(Code2)、排放標準依據(Std_s)。</span>
                    </td>
                </tr>
                <tr>
                    <td class="title-r">收錄期間</td>
                    <td>
                        <span id="ctl00_ContentPlaceHolder1_ucMetaData_lblDataPeriod">2004/01/01至2019/04/17</span>
                    </td>
                </tr>
            </tbody>
            <tr>
                <td class="title-r">更新頻率</td>
                <td>
                    <span id="ctl00_ContentPlaceHolder1_ucMetaData_lblUpdateFrequencyId">每天</span>
                </td>
            </tr>
            <tr>
                <td class="title-r">資料集內容最後更新日期</td>
                <td>
                    <span id="ctl00_ContentPlaceHolder1_ucMetaData_lblUpdateTime">2019/04/17</span>
                </td>
            </tr>
            <tr>
                <td class="title-r">提供機關</td>
                <td>
                    <span id="ctl00_ContentPlaceHolder1_ucMetaData_lblDatasetAgencyId">行政院環境保護署</span>
                </td>
            </tr>



        </table>
    </div>


            </div>
            <div class="clr"></div>
        </div>

        <div class="title" style="float: left;">
            <ul>
                <li class="active"><a href="#">固定污染源CEMS監測數據紀錄值資料集 </a></li>
            </ul>
        </div>
        <div style="float: right;">


<script type="text/javascript" src="/Resource/js/gvColspan.js" charset="UTF-8"></script>
<script>
    $(function () {
        //註解
        if ($.trim($('#ctl00_ContentPlaceHolder1_ShareAndExport_Label2').html()) == "") {
            $('.tbResult').each(function () {
                var comment1 = $(this).parents('tr:first').next().children(':first').html()
                var comment2 = $(this).parents('tr:first').next().next().children(':first').html()
                if (comment1 != null && $.trim(comment1) != "") {
                    if (comment2 != null) {
                        $('#ctl00_ContentPlaceHolder1_ShareAndExport_Label1').html(comment1)
                        $('#ctl00_ContentPlaceHolder1_ShareAndExport_comment1').val(comment1.replace(/<br>/ig, "$"))
                        $('#ctl00_ContentPlaceHolder1_ShareAndExport_Label2').html(comment2)
                        $('#ctl00_ContentPlaceHolder1_ShareAndExport_comment2').val(comment2.replace(/<br>/ig, "$"))
                    } else {
                        $('#ctl00_ContentPlaceHolder1_ShareAndExport_Label2').html(comment1);
                        $('#ctl00_ContentPlaceHolder1_ShareAndExport_comment2').val(comment1.replace(/<br>/ig, "$"))
                    }
                }
            })
        }
    })

    function myPrint() {
        var newWindow = window.open("../../Resource/viewPrint.aspx", "_blank");
        return false;
    }

    function printScreen(printlist) {
        var value = printlist.innerHTML;
        var printPage = window.open("", "Printing...", "");
        printPage.document.open();
        printPage.document.write("<HTML><head>");
        printPage.document.write("<link rel='stylesheet' href='../../../Resource/css/PageStyle.css' />");
        printPage.document.write("</head><BODY><input type='button' value='列印報表' onclick='window.print();window.close();'></input></br></br>");
        printPage.document.write(value);
        printPage.document.close("</BODY></HTML>");
    }

    function newwindow() {
        var tagname = $('#sitemap').text().split('/')[4].trim();
        var description = $('#ctl00_ContentPlaceHolder1_ucMetaData_lblMetaDesc').text().trim();
        var picture = $('.logo img').attr('src');
        var caption = $('#ctl00_ContentPlaceHolder1_ucMetaData_lblDatasetAgencyId').text().trim();;
        //window.open("https://www.facebook.com/dialog/feed?app_id=1470773149913325&redirect_uri=https://www.facebook.com&display=popup&caption=" + encodeURIComponent(tagname) + "&name=" + encodeURIComponent(tagname) + "&description" + encodeURIComponent(tagname) + "&link=" + encodeURIComponent(location.href));
        window.open("https://www.facebook.com/dialog/feed?app_id=1470773149913325&redirect_uri=https://www.facebook.com&display=popup&caption=" + encodeURIComponent(caption) + "&picture" + encodeURIComponent(picture) + "&name=" + encodeURIComponent(tagname) + "&description=" + encodeURIComponent(description) + "&link=" + encodeURIComponent(location.href));
    }
</script>
<style type="text/css">
    #ctl00_ContentPlaceHolder1_ShareAndExport_gvPrint th {
        background-color: #DEDEDE;
    }
</style>
<div>


    <a href="javascript:newwindow()">
        <img src="../../Resource/images/btnFB.jpg" /></a>







    <a href="javascript: void(window.open('https://plus.google.com/share?url='.concat(encodeURIComponent(location.href)),'gplusshare'))">
        <img src="../../Resource/images/btnGooglePlus.jpg" /></a>


    <input type="image" name="ctl00$ContentPlaceHolder1$ShareAndExport$ibtnExcel" id="ctl00_ContentPlaceHolder1_ShareAndExport_ibtnExcel" src="../../Resource/images/btnCSV.png" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;ctl00$ContentPlaceHolder1$ShareAndExport$ibtnExcel&quot;, &quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, false))" border="0" />

    <input type="image" name="ctl00$ContentPlaceHolder1$ShareAndExport$ImageButton1" id="ctl00_ContentPlaceHolder1_ShareAndExport_ImageButton1" src="../../Resource/images/btnPrint.jpg" alt="列印" onclick="printScreen(ContentPlaceHolder1_myHead_printGV); return false;WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;ctl00$ContentPlaceHolder1$ShareAndExport$ImageButton1&quot;, &quot;&quot;, true, &quot;&quot;, &quot;&quot;, false, false))" border="0" />
</div>
<div id="ctl00_ContentPlaceHolder1_ShareAndExport_printdiv" style="display: none;">
    <div>
    <table class="gvColspan" cellspacing="0" rules="all" border="1" id="ctl00_ContentPlaceHolder1_ShareAndExport_gvPrint" width="90%">
        <tr>
            <th class="RowSpan" scope="col">所屬環保局</th><th class="RowSpan" scope="col">管制編號</th><th scope="col">公司簡稱</th><th scope="col">煙囪序號</th><th scope="col">監測項目</th><th scope="col">監測項目編號</th><th scope="col">監測時間</th><th scope="col">監測數值</th><th scope="col">排放標準</th><th scope="col">單位</th><th scope="col">資料識別碼</th><th scope="col">資料識別碼</th><th scope="col">排放標準依據</th>
        **</tr><tr>
            <td align="center" width="60">台北市</td><td align="center" width="80">A4000283</td><td align="center" width="80">臺北市政府環境保護局木柵垃圾焚化廠</td><td align="center" width="80">P002</td><td align="center" width="80">氮氧化物監測設施十五分鐘數據紀錄值</td><td align="center" width="60">923 </td><td align="center" width="30">2019-04-17 00:00:00</td><td align="center" width="30">0.00</td><td align="center" width="50">220</td><td align="center" width="50">ppm    </td><td align="center" width="50">00</td><td align="center" width="50">固定污染源暫停運轉時監測設施之量測值</td><td align="center" width="50">廢棄物焚化爐空氣污染物排放標準</td>
        </tr><tr>
            <td align="center" width="60">台北市</td><td align="center" width="80">A4000283</td><td align="center" width="80">臺北市政府環境保護局木柵垃圾焚化廠</td><td align="center" width="80">P002</td><td align="center" width="80">氮氧化物監測設施一小時數據平均值</td><td align="center" width="60">223 </td><td align="center" width="30">2019-04-17 00:00:00</td><td align="center" width="30">0.00</td><td align="center" width="50">220</td><td align="center" width="50">ppm    </td><td align="center" width="50">00</td><td align="center" width="50">固定污染源暫停運轉時監測設施之量測值</td><td align="center" width="50">廢棄物焚化爐空氣污染物排放標準</td>
        </tr><tr>
            <td align="center" width="60">台北市</td><td align="center" width="80">A4000283</td><td align="center" width="80">臺北市政府環境保護局木柵垃圾焚化廠</td><td align="center" width="80">P002</td><td align="center" width="80">氯化氫監測設施十五分鐘數據紀錄值</td><td align="center" width="60">926 </td><td align="center" width="30">2019-04-17 00:00:00</td><td align="center" width="30">0.00</td><td align="center" width="50">60</td><td align="center" width="50">ppm    </td><td align="center" width="50">00</td><td align="center" width="50">固定污染源暫停運轉時監測設施之量測值</td><td align="center" width="50">廢棄物焚化爐空氣污染物排放標準</td>
        </tr><tr>
            <td align="center" width="60">台北市</td><td align="center" width="80">A4000283</td><td align="center" width="80">臺北市政府環境保護局木柵垃圾焚化廠</td><td align="center" width="80">P004</td><td align="center" width="80">不透光率六分鐘數據紀錄值</td><td align="center" width="60">911 </td><td align="center" width="30">2019-04-17 00:00:00</td><td align="center" width="30">0.00</td><td align="center" width="50">20</td><td align="center" width="50">%      </td><td align="center" width="50">00</td><td align="center" width="50">固定污染源暫停運轉時監測設施之量測值</td><td align="center" width="50">廢棄物焚化爐空氣污染物排放標準</td>
        </tr><tr>
            <td align="center" width="60">台北市</td><td align="center" width="80">A4000283</td><td align="center" width="80">臺北市政府環境保護局木柵垃圾焚化廠</td><td align="center" width="80">P004</td><td align="center" width="80">氮氧化物監測設施一小時數據平均值</td><td align="center" width="60">223 </td><td align="center" width="30">2019-04-17 00:00:00</td><td align="center" width="30">0.00</td><td align="center" width="50">220</td><td align="center" width="50">ppm    </td><td align="center" width="50">00</td><td align="center" width="50">固定污染源暫停運轉時監測設施之量測值</td><td align="center" width="50">廢棄物焚化爐空氣污染物排放標準</td>
        </tr><tr>
            <td align="center" width="60">台北市</td><td align="center" width="80">A4000283</td><td align="center" width="80">臺北市政府環境保護局木柵垃圾焚化廠</td><td align="center" width="80">P004</td><td align="center" width="80">氧氣監測設施一小時數據平均值</td><td align="center" width="60">236 </td><td align="center" width="30">2019-04-17 00:00:00</td><td align="center" width="30">0.00</td><td align="center" width="50">無排放標準</td><td align="center" width="50">%      </td><td align="center" width="50">00</td><td align="center" width="50">固定污染源暫停運轉時監測設施之量測值</td><td align="center" width="50">無</td>
        </tr><tr>
            <td align="center" width="60">台北市</td><td align="center" width="80">A4000283</td><td align="center" width="80">臺北市政府環境保護局木柵垃圾焚化廠</td><td align="center" width="80">P004</td><td align="center" width="80">氧氣監測設施十五分鐘數據紀錄值</td><td align="center" width="60">936 </td><td align="center" width="30">2019-04-17 00:00:00</td><td align="center" width="30">0.00</td><td align="center" width="50">無排放標準</td><td align="center" width="50">%      </td><td align="center" width="50">00</td><td align="center" width="50">固定污染源暫停運轉時監測設施之量測值</td><td align="center" width="50">無</td>
        </tr><tr>
            <td align="center" width="60">台北市</td><td align="center" width="80">A4000283</td><td align="center" width="80">臺北市政府環境保護局木柵垃圾焚化廠</td><td align="center" width="80">P004</td><td align="center" width="80">排放流率監測設施一小時數據平均值</td><td align="center" width="60">248 </td><td align="center" width="30">2019-04-17 00:00:00</td><td align="center" width="30">0.00</td><td align="center" width="50">無排放標準</td><td align="center" width="50">CMH    </td><td align="center" width="50">00</td><td align="center" width="50">固定污染源暫停運轉時監測設施之量測值</td><td align="center" width="50">無</td>
        </tr><tr>
        ....
        ....
        ....
        ....
        ....

**

0 个答案:

没有答案