我从内部网络抓取数据,它清楚地提取数据,因为当我'打印'时我可以看到XML内容。
ptdf_data1 = requests.get(r'https://zema.nam.nsroot.net:8443/datadirect/ZEData?command=LoadProfile&username=%(username)s&password=%(password)s&id=Citi&groupname=%(profile_group)s&profilename=%(profile_name)s&profileowner=%(profile_username)s&style=xml' % params, verify=False).content
我正在尝试使用漂亮的汤将数据解析为下面标记的列(每列将有大量价格数据列表)
soup = BeautifulSoup(ptdf_data1, "lxml")
ptdf_data = []
for ptdf_data_xml in soup.findAll(ptdf_data1): # 'Pdtf'): #
dt = ptdf_data_xml.Date
hr = ptdf_data_xml.CalendarHour
row = ptdf_data_xml.RowNumber
ram = ptdf_data_xml.RemainingAvailableMargin
be = ptdf_data_xml.BE
de = ptdf_data_xml.DE
fr = ptdf_data_xml.FR
nl = ptdf_data_xml.NL
ptdf_data += [(
int(row.text),
pytz.timezone('CET').localize(
datetime.datetime.strptime(dt.text, "%Y-%m-%dT%H:%M:%S")) +
datetime.timedelta(hours=int(hr.text) - 1),
float(deat.text),
float(fr.text),
float(nl.text),
float(be.text),
float(ram.text))]
ptdf_data = pandas.DataFrame(data=ptdf_data, columns=['Row', 'DateTime', 'DE', 'FR', 'NL', 'BE', 'RAM'])
ptdf = ptdf_data.set_index('DateTime')
但是我一直在用列标签获取一个空的数据帧。 正如所要求的XML代码是:
<?xml version="1.0" encoding="UTF-8"?>
<Profile>
<DataSource>
<IdNumber>1</IdNumber>
<Series>a</Series>
<DataSourceCaption>DE</DataSourceCaption>
<DataSourceName>EPEX</DataSourceName>
<DataReport>Power Spot Market Auction (Hourly)</DataReport>
<Observation>Data Value(AVERAGE)</Observation>
<Numerator>EUR</Numerator>
<Denominator>MWh</Denominator>
<Commodity>Electricity</Commodity>
<Interval>Daily</Interval>
<Attribute> <Caption>Country</Caption><Label>Germany/Austria</Label><Value>Germany/Austria</Value></Attribute>
<Attribute> <Caption>Data Type</Caption><Label>Price</Label><Value>Price</Value></Attribute>
<Filter></Filter>
</DataSource>
<DataSource>
<IdNumber>2</IdNumber>
<Series>b</Series>
<DataSourceCaption>FR</DataSourceCaption>
<DataSourceName>EPEX</DataSourceName>
<DataReport>Power Spot Market Auction (Hourly)</DataReport>
<Observation>Data Value(AVERAGE)</Observation>
<Numerator>EUR</Numerator>
<Denominator>MWh</Denominator>
<Commodity>Electricity</Commodity>
<Interval>Daily</Interval>
<Attribute> <Caption>Country</Caption><Label>France</Label><Value>France</Value></Attribute>
<Attribute> <Caption>Data Type</Caption><Label>Price</Label><Value>Price</Value></Attribute>
<Filter></Filter>
</DataSource>
<DataSource>
<IdNumber>3</IdNumber>
<Series>c</Series>
<DataSourceCaption>CH</DataSourceCaption>
<DataSourceName>EPEX</DataSourceName>
<DataReport>Power Spot Market Auction (Hourly)</DataReport>
<Observation>Data Value(AVERAGE)</Observation>
<Numerator>EUR</Numerator>
<Denominator>MWh</Denominator>
<Commodity>Electricity</Commodity>
<Interval>Daily</Interval>
<Attribute> <Caption>Country</Caption><Label>Switzerland</Label><Value>Switzerland</Value></Attribute>
<Attribute> <Caption>Data Type</Caption><Label>Price</Label><Value>Price</Value></Attribute>
<Filter></Filter>
</DataSource>
<DataSource>
<IdNumber>4</IdNumber>
<Series>d</Series>
<DataSourceCaption>ES</DataSourceCaption>
<DataSourceName>OMEL</DataSourceName>
<DataReport>Daily Market Hourly Prices</DataReport>
<Observation>Spain Price(AVERAGE)</Observation>
<Commodity>Electricity</Commodity>
<Interval>Daily</Interval>
<Filter></Filter>
</DataSource>
<DataSource>
<IdNumber>5</IdNumber>
<Series>e</Series>
<DataSourceCaption>PT</DataSourceCaption>
<DataSourceName>OMEL</DataSourceName>
<DataReport>Daily Market Hourly Prices</DataReport>
<Observation>Portugal Price(AVERAGE)</Observation>
<Commodity>Electricity</Commodity>
<Interval>Daily</Interval>
<Filter></Filter>
</DataSource>
<DataSource>
<IdNumber>6</IdNumber>
<Series>f</Series>
<DataSourceCaption>CZ</DataSourceCaption>
<DataSourceName>OTE</DataSourceName>
<DataReport>Day-Ahead Market CZ Result</DataReport>
<Observation>Price(AVERAGE)</Observation>
<Commodity>Electricity</Commodity>
<Interval>Daily</Interval>
<Filter></Filter>
</DataSource>
<DataSource>
<IdNumber>7</IdNumber>
<Series>g</Series>
<DataSourceCaption>NL</DataSourceCaption>
<DataSourceName>APX</DataSourceName>
<DataReport>NL Power Day Ahead Market (Hourly)</DataReport>
<Observation>Value(AVERAGE)</Observation>
<Commodity>Energy</Commodity>
<Interval>Daily</Interval>
<Attribute> <Caption>Market Type</Caption><Label>prices</Label><Value>prices</Value></Attribute>
<Filter></Filter>
</DataSource>
<DataSource>
<IdNumber>8</IdNumber>
<Series>h</Series>
<DataSourceCaption>BE</DataSourceCaption>
<DataSourceName>Belpex</DataSourceName>
<DataReport>Daily Market Results Hourly</DataReport>
<Observation>Price(AVERAGE)</Observation>
<Commodity>Electricity</Commodity>
<Interval>Daily</Interval>
<Filter></Filter>
</DataSource>
<DataSource>
<IdNumber>9</IdNumber>
<Series>i</Series>
<DataSourceCaption>IT</DataSourceCaption>
<DataSourceName>GME</DataSourceName>
<DataReport>Day Ahead Electricity Market Price</DataReport>
<Observation>Price(AVERAGE)</Observation>
<Commodity>Electricity</Commodity>
<Interval>Daily</Interval>
<Attribute> <Caption>Market</Caption><Label>MGP</Label><Value>MGP</Value></Attribute>
<Attribute> <Caption>Zone</Caption><Label>PUN</Label><Value>PUN</Value></Attribute>
<Filter></Filter>
</DataSource>
<DataSource>
<IdNumber>10</IdNumber>
<Series>j</Series>
<DataSourceCaption>IT NORD</DataSourceCaption>
<DataSourceName>GME</DataSourceName>
<DataReport>Day Ahead Electricity Market Price</DataReport>
<Observation>Price(AVERAGE)</Observation>
<Commodity>Electricity</Commodity>
<Interval>Daily</Interval>
<Attribute> <Caption>Market</Caption><Label>MGP</Label><Value>MGP</Value></Attribute>
<Attribute> <Caption>Zone</Caption><Label>NORD</Label><Value>NORD<</Attribute>
<Filter></Filter>
</DataSource>
<DataSource>
<IdNumber>11</IdNumber>
<Series>k</Series>
<DataSourceCaption>UK</DataSourceCaption>
<DataSourceName>N2EX</DataSourceName>
<DataReport>Day Ahead Auction Market Prices</DataReport>
<Observation>Price(AVERAGE)</Observation>
<Commodity>Electricity</Commodity>
<Interval>Daily</Interval>
<Filter></Filter>
</DataSource>
<DataSource>
<IdNumber>12</IdNumber>
<Series>l</Series>
<DataSourceCaption>NP</DataSourceCaption>
<DataSourceName>NordPool</DataSourceName>
<DataReport>Elspot System Prices</DataReport>
<Observation>Price(AVERAGE)</Observation>
<Commodity>Electricity</Commodity>
<Interval>Daily</Interval>
<Attribute> <Caption>Currency</Caption><Label>EUR</Label><Value>EUR</Value></Attribute>
<Filter></Filter>
</DataSource>
<DataSourceData>
<ResultSet>
<Date>02/24/2016</Date>
<Result>23.951</Result>
<Result>29.646</Result>
<Result>33.317</Result>
<Result>30.423</Result>
<Result>30.423</Result>
<Result>24.322</Result>
<Result>27.563</Result>
<Result>29.191</Result>
<Result>36.183</Result>
<Result>36.204</Result>
<Result>35.935</Result>
<Result>20.417</Result>
<formatted-date-string>2016-02-24</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>02/25/2016</Date>
<Result>27.880</Result>
<Result>29.561</Result>
<Result>33.439</Result>
<Result>26.921</Result>
<Result>26.921</Result>
<Result>27.616</Result>
<Result>27.862</Result>
<Result>28.705</Result>
<Result>37.117</Result>
<Result>36.999</Result>
<Result>43.896</Result>
<Result>25.886</Result>
<formatted-date-string>2016-02-25</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>02/26/2016</Date>
<Result>27.834</Result>
<Result>28.088</Result>
<Result>32.744</Result>
<Result>25.458</Result>
<Result>24.902</Result>
<Result>27.205</Result>
<Result>28.088</Result>
<Result>28.088</Result>
<Result>37.323</Result>
<Result>37.364</Result>
<Result>34.400</Result>
<Result>23.864</Result>
<formatted-date-string>2016-02-26</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>02/27/2016</Date>
<Result>23.001</Result>
<Result>23.251</Result>
<Result>30.112</Result>
<Result>5.792</Result>
<Result>5.792</Result>
<Result>22.696</Result>
<Result>23.369</Result>
<Result>23.363</Result>
<Result>34.391</Result>
<Result>33.768</Result>
<Result>33.278</Result>
<Result>19.640</Result>
<formatted-date-string>2016-02-27</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>02/28/2016</Date>
<Result>18.337</Result>
<Result>18.353</Result>
<Result>18.763</Result>
<Result>6.680</Result>
<Result>6.680</Result>
<Result>16.787</Result>
<Result>18.858</Result>
<Result>18.358</Result>
<Result>27.882</Result>
<Result>28.112</Result>
<Result>34.258</Result>
<Result>19.036</Result>
<formatted-date-string>2016-02-28</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>02/29/2016</Date>
<Result>23.945</Result>
<Result>27.753</Result>
<Result>30.684</Result>
<Result>21.115</Result>
<Result>21.115</Result>
<Result>23.410</Result>
<Result>24.862</Result>
<Result>27.766</Result>
<Result>33.336</Result>
<Result>34.053</Result>
<Result>33.157</Result>
<Result>24.912</Result>
<formatted-date-string>2016-02-29</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/01/2016</Date>
<Result>24.997</Result>
<Result>31.256</Result>
<Result>33.418</Result>
<Result>29.462</Result>
<Result>29.462</Result>
<Result>23.577</Result>
<Result>26.202</Result>
<Result>30.936</Result>
<Result>34.815</Result>
<Result>34.790</Result>
<Result>33.526</Result>
<Result>20.572</Result>
<formatted-date-string>2016-03-01</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/02/2016</Date>
<Result>24.049</Result>
<Result>26.570</Result>
<Result>33.094</Result>
<Result>23.048</Result>
<Result>23.048</Result>
<Result>23.207</Result>
<Result>26.442</Result>
<Result>26.927</Result>
<Result>35.016</Result>
<Result>35.447</Result>
<Result>33.089</Result>
<Result>23.946</Result>
<formatted-date-string>2016-03-02</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/03/2016</Date>
<Result>28.190</Result>
<Result>29.252</Result>
<Result>32.461</Result>
<Result>25.596</Result>
<Result>25.583</Result>
<Result>28.197</Result>
<Result>28.446</Result>
<Result>29.229</Result>
<Result>32.742</Result>
<Result>34.482</Result>
<Result>36.090</Result>
<Result>25.562</Result>
<formatted-date-string>2016-03-03</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/04/2016</Date>
<Result>24.884</Result>
<Result>29.962</Result>
<Result>32.458</Result>
<Result>19.838</Result>
<Result>19.838</Result>
<Result>24.552</Result>
<Result>26.717</Result>
<Result>30.170</Result>
<Result>35.557</Result>
<Result>36.167</Result>
<Result>33.620</Result>
<Result>23.783</Result>
<formatted-date-string>2016-03-04</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/05/2016</Date>
<Result>23.126</Result>
<Result>24.118</Result>
<Result>29.272</Result>
<Result>10.682</Result>
<Result>10.292</Result>
<Result>22.049</Result>
<Result>25.649</Result>
<Result>24.699</Result>
<Result>34.725</Result>
<Result>34.641</Result>
<Result>32.741</Result>
<Result>20.102</Result>
<formatted-date-string>2016-03-05</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/06/2016</Date>
<Result>21.334</Result>
<Result>21.609</Result>
<Result>21.653</Result>
<Result>13.356</Result>
<Result>13.115</Result>
<Result>16.886</Result>
<Result>21.610</Result>
<Result>21.610</Result>
<Result>34.651</Result>
<Result>33.972</Result>
<Result>34.074</Result>
<Result>19.792</Result>
<formatted-date-string>2016-03-06</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/07/2016</Date>
<Result>29.423</Result>
<Result>32.991</Result>
<Result>34.681</Result>
<Result>25.289</Result>
<Result>22.293</Result>
<Result>28.658</Result>
<Result>29.912</Result>
<Result>32.236</Result>
<Result>37.597</Result>
<Result>37.622</Result>
<Result>38.988</Result>
<Result>23.452</Result>
<formatted-date-string>2016-03-07</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/08/2016</Date>
<Result>28.364</Result>
<Result>32.237</Result>
<Result>35.737</Result>
<Result>29.982</Result>
<Result>29.943</Result>
<Result>28.083</Result>
<Result>28.434</Result>
<Result>30.905</Result>
<Result>39.336</Result>
<Result>39.819</Result>
<Result>35.519</Result>
<Result>25.448</Result>
<formatted-date-string>2016-03-08</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/09/2016</Date>
<Result>25.749</Result>
<Result>27.745</Result>
<Result>35.545</Result>
<Result>21.462</Result>
<Result>21.268</Result>
<Result>25.025</Result>
<Result>26.230</Result>
<Result>27.421</Result>
<Result>37.892</Result>
<Result>37.571</Result>
<Result>34.246</Result>
<Result>24.275</Result>
<formatted-date-string>2016-03-09</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/10/2016</Date>
<Result>26.155</Result>
<Result>31.515</Result>
<Result>34.400</Result>
<Result>20.497</Result>
<Result>20.497</Result>
<Result>26.828</Result>
<Result>27.535</Result>
<Result>31.321</Result>
<Result>40.087</Result>
<Result>39.447</Result>
<Result>38.510</Result>
<Result>25.014</Result>
<formatted-date-string>2016-03-10</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/11/2016</Date>
<Result>27.922</Result>
<Result>29.680</Result>
<Result>33.744</Result>
<Result>30.865</Result>
<Result>30.413</Result>
<Result>26.663</Result>
<Result>28.286</Result>
<Result>29.578</Result>
<Result>36.554</Result>
<Result>36.599</Result>
<Result>36.339</Result>
<Result>26.737</Result>
<formatted-date-string>2016-03-11</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/12/2016</Date>
<Result>27.815</Result>
<Result>27.815</Result>
<Result>25.409</Result>
<Result>28.225</Result>
<Result>28.225</Result>
<Result>24.658</Result>
<Result>27.815</Result>
<Result>27.815</Result>
<Result>34.741</Result>
<Result>33.982</Result>
<Result>33.138</Result>
<Result>22.541</Result>
<formatted-date-string>2016-03-12</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/13/2016</Date>
<Result>22.927</Result>
<Result>22.738</Result>
<Result>24.971</Result>
<Result>24.054</Result>
<Result>24.266</Result>
<Result>20.809</Result>
<Result>23.224</Result>
<Result>22.601</Result>
<Result>30.599</Result>
<Result>30.447</Result>
<Result>32.662</Result>
<Result>21.515</Result>
<formatted-date-string>2016-03-13</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/14/2016</Date>
<Result>27.455</Result>
<Result>27.869</Result>
<Result>32.355</Result>
<Result>36.541</Result>
<Result>36.456</Result>
<Result>26.338</Result>
<Result>27.509</Result>
<Result>27.839</Result>
<Result>34.130</Result>
<Result>34.375</Result>
<Result>34.230</Result>
<Result>22.581</Result>
<formatted-date-string>2016-03-14</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/15/2016</Date>
<Result>27.145</Result>
<Result>29.675</Result>
<Result>33.590</Result>
<Result>41.621</Result>
<Result>41.621</Result>
<Result>26.875</Result>
<Result>27.912</Result>
<Result>29.607</Result>
<Result>40.484</Result>
<Result>39.827</Result>
<Result>32.671</Result>
<Result>22.234</Result>
<formatted-date-string>2016-03-15</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/16/2016</Date>
<Result>25.410</Result>
<Result>28.734</Result>
<Result>34.607</Result>
<Result>35.361</Result>
<Result>35.580</Result>
<Result>25.177</Result>
<Result>26.487</Result>
<Result>28.639</Result>
<Result>42.086</Result>
<Result>41.734</Result>
<Result>32.560</Result>
<Result>21.761</Result>
<formatted-date-string>2016-03-16</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/17/2016</Date>
<Result>27.631</Result>
<Result>29.734</Result>
<Result>34.616</Result>
<Result>41.450</Result>
<Result>41.450</Result>
<Result>27.165</Result>
<Result>29.369</Result>
<Result>30.090</Result>
<Result>36.515</Result>
<Result>35.767</Result>
<Result>34.799</Result>
<Result>21.890</Result>
<formatted-date-string>2016-03-17</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/18/2016</Date>
<Result>25.241</Result>
<Result>31.313</Result>
<Result>32.417</Result>
<Result>43.768</Result>
<Result>43.649</Result>
<Result>25.368</Result>
<Result>28.130</Result>
<Result>31.503</Result>
<Result>37.349</Result>
<Result>37.003</Result>
<Result>34.086</Result>
<Result>21.833</Result>
<formatted-date-string>2016-03-18</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/19/2016</Date>
<Result>24.549</Result>
<Result>27.268</Result>
<Result>26.782</Result>
<Result>37.610</Result>
<Result>37.610</Result>
<Result>23.636</Result>
<Result>25.452</Result>
<Result>27.363</Result>
<Result>33.488</Result>
<Result>32.651</Result>
<Result>33.576</Result>
<Result>21.442</Result>
<formatted-date-string>2016-03-19</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/20/2016</Date>
<Result>17.883</Result>
<Result>20.992</Result>
<Result>20.503</Result>
<Result>34.857</Result>
<Result>34.857</Result>
<Result>18.543</Result>
<Result>25.680</Result>
<Result>22.925</Result>
<Result>31.672</Result>
<Result>30.754</Result>
<Result>32.830</Result>
<Result>20.380</Result>
<formatted-date-string>2016-03-20</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/21/2016</Date>
<Result>25.301</Result>
<Result>32.263</Result>
<Result>32.770</Result>
<Result>35.996</Result>
<Result>35.889</Result>
<Result>25.152</Result>
<Result>27.369</Result>
<Result>31.490</Result>
<Result>34.397</Result>
<Result>34.980</Result>
<Result>36.898</Result>
<Result>21.706</Result>
<formatted-date-string>2016-03-21</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/22/2016</Date>
<Result>28.422</Result>
<Result>33.111</Result>
<Result>34.305</Result>
<Result>34.639</Result>
<Result>34.317</Result>
<Result>27.990</Result>
<Result>29.109</Result>
<Result>32.485</Result>
<Result>34.369</Result>
<Result>35.023</Result>
<Result>37.596</Result>
<Result>21.857</Result>
<formatted-date-string>2016-03-22</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/23/2016</Date>
<Result>26.442</Result>
<Result>30.177</Result>
<Result>32.139</Result>
<Result>30.111</Result>
<Result>29.893</Result>
<Result>27.974</Result>
<Result>27.722</Result>
<Result>30.000</Result>
<Result>33.983</Result>
<Result>34.055</Result>
<Result>35.109</Result>
<Result>21.740</Result>
<formatted-date-string>2016-03-23</formatted-date-string>
</ResultSet>
<ResultSet>
<Date>03/24/2016</Date>
<Result>27.784</Result>
<Result>28.842</Result>
<Result>31.024</Result>
<Result>28.307</Result>
<Result>28.307</Result>
<Result>27.908</Result>
<Result>28.035</Result>
<Result>28.785</Result>
<Result>31.508</Result>
<Result>32.499</Result>
<Result>33.318</Result>
<Result>21.325</Result>
<formatted-date-string>2016-03-24</formatted-date-string>
</ResultSet>
</DataSourceData>
<profile-name>Spot Baseload</profile-name>
<date-formats>
<first-col-date-format>yyyy-MM-dd</first-col-date-format>
<result-date-format>yyyy-MM-dd</result-date-format>
<result-timestamp-format>yyyy-MM-dd HH:mm:ss</result-timestamp-format>
</date-formats>
<profile-settings />
<profile-options>
<start-date>2016-02-24</start-date>
<precision>3</precision>
<interval>Daily</interval>
<sort-order>ASC</sort-order>
<observe-dst>MERGED</observe-dst>
<suppress-nulls>false</suppress-nulls>
<end-date>2016-03-24</end-date>
</profile-options>
<group-name>Spot Prices</group-name>
</Profile>
答案 0 :(得分:1)
这更适合评论,但我还不能这样做。您发布的代码中存在三个问题:
1)find_all的使用不正确,其参数应该是标签名称。在您的情况下,要从元素<DataSource>
获取相关信息,您需要更正find_all
到soup.find_all("DataSource")
或从<ResultSet>
更正soup.find_all("ResultSet")
,请参阅this上的文档:
2)你从已经解析过的文档中调用的标记调用没有意义,因为就我在xml中看到的而言,它们并不对应于除Date
之外的任何元素。 ResultSet
。您可以做的是将所需标记的确切位置提供给css选择器,使用soup.select
方法查看here,或者如果树的结构允许,您可以使用find
方法稳定使用它,请参阅here
3)要从字符串中取出字符串,可以使用soup.get_text()
方法,请参阅here。虽然如果元素的子元素是可导航的字符串,.string
也应该有效,但请参阅here。
处理完这些内容后,特别是一旦您显示哪些元素与您的代码调用相对应。我们可以看看我们是否可以解决问题。